“Many embedded systems are deployed in places that are difficult or inaccessible for operators. This is especially true for Internet of Things (IoT) applications, which are usually deployed in large numbers and have limited battery life. Examples include embedded systems that monitor the health of people or machines. These challenges, coupled with the rapid iterative software life cycle, have resulted in many systems needing to support over-the-air (OTA) updates.
Author: Benjamin Bucklin Brown
Many embedded systems are deployed in places that are difficult or inaccessible for operators. This is especially true for Internet of Things (IoT) applications, which are usually deployed in large numbers and have limited battery life. Examples include embedded systems that monitor the health of people or machines. These challenges, coupled with the rapid iterative software life cycle, have resulted in many systems needing to support over-the-air (OTA) updates. The OTA update replaces the software on the microcontroller or microprocessor of the embedded system with new software. Although many people are very familiar with OTA updates on mobile devices, designing and implementing them on resource-constrained systems brings many different challenges. This article will introduce several different software designs for OTA updates and discuss their advantages and disadvantages. We will learn how OTA update software utilizes the hardware features of two ultra-low power microcontrollers.
Server and client
The OTA update replaces the current software on the device with new software, and the new software is downloaded wirelessly. In an embedded system, the device running this software is usually a microcontroller. A microcontroller is a small computing device with limited memory, speed, and power consumption. Microcontrollers usually contain a microprocessor (core) and digital hardware modules (peripherals) used to perform specific operations. Ultra-low power microcontrollers with typical power consumption of 30μA/MHz to 40μA/MHz in working mode are ideal for this type of application. Using specific hardware peripherals on these microcontrollers and placing them in low-power mode is an important part of OTA update software design. Figure 1 shows an example of an embedded system that may require OTA updates. It can be seen that a microcontroller is connected to radios and sensors, which can be used in IoT applications, using sensors to collect data about the environment, and using radios to report data on a regular basis. This part of the system is called the edge node or client and is the target of OTA updates. The other part of the system is called the cloud or server, which is the provider of new software. The server and the client communicate via a wireless connection using a transceiver (radio).
Figure 1. Server/client architecture in a sample embedded system
What is a software application?
Most of the operation of the OTA update process is to transfer the new software from the server to the client. After the software converts from the source format to the binary format, it is transmitted as a byte sequence. The conversion process will compile the source code file (such as c, cpp), link it into an executable file (such as exe, elf), and then convert the executable file into a portable binary file format (such as bin, hex). In a nutshell, these file formats contain a byte sequence that belongs to a specific address in the memory in the microcontroller. Usually, we conceptualize the information sent over the wireless link as data, such as commands to change the state of the system or sensor data collected by the system. As far as OTA updates are concerned, data is new software in binary format. In many cases, the binary file is very large and cannot be sent from the server to the client in a single transfer, which means that the binary file needs to be put into multiple different data packets. This process is called “subcontracting.” In order to better illustrate this process, Figure 2 demonstrates how different versions of the software generate different binary files to send different data packets during the OTA update. In this simple example, each data packet contains 8 bytes of data, and the first 4 bytes represent the address used to store the last 4 bytes in the client memory.
Based on this high-level description of the OTA update process, the OTA update solution must deal with three major challenges. The first challenge is related to memory. The software solution must organize the new software application into the volatile or non-volatile memory of the client device so that it can be executed when the update process is complete. The solution must ensure that the previous version of the software is retained as a backup application in case there is a problem with the new software. In addition, when resetting and restarting after a power failure, we must keep the state of the client device-such as the currently running software version and its location in the memory-unchanged. The second major challenge is communication. The new software must be sent from the server to the client in the form of discrete data packets, and each data packet must be placed at a specific address in the client’s memory. The subcontracting scheme, data packet structure and data transmission protocol must be thoroughly considered in the software design. The last major challenge is security. When new software is sent wirelessly from the server to the client, we must ensure that the server is a trusted party. This security challenge is called authentication. We must also obfuscate the new software to prevent observers from peeping, because it may contain sensitive information. This security challenge is called secrecy. The last element of security is integrity, which is to ensure that new software will not be damaged when it is sent wirelessly.
Figure 2. Binary conversion and subcontracting process of software applications
Second stage boot loader (SSBL)
Understanding the boot sequence
The master boot loader is a software application that permanently resides in the read-only memory of the microcontroller. The storage area where the main boot loader is located is called the information space, and sometimes users cannot access it. The application will be executed every time it is reset, some necessary hardware initializations are generally completed, and user software may be loaded into the memory. However, if the microcontroller contains on-chip non-volatile memory (such as flash memory), the bootloader does not need to be loaded, and only needs to be transferred to the program in the flash memory. If the master boot loader does not support OTA updates, there must be a second stage boot loader. Like the master bootloader, SSBL will run on every reset, but will implement part of the OTA update process. This boot sequence is shown in Figure 3. This section will explain why the second-stage bootloader is needed, and explain how to specify the role of this application is an important design trade-off.
Lesson learned: There must be an SSBL
Conceptually, it seems simpler to omit the SSBL and put all OTA update functions into the user application, because in this way, the OTA process can seamlessly utilize the existing software framework, operating system, and device drivers. Figure 4 shows the memory map and boot sequence of a system that chooses this method.
Application A is the original application deployed on the on-site microcontroller. This application includes OTA update related software, when the server requests, use this software to download application B. After the download is complete and the application B is verified, the application A will execute a branch instruction to the reset processing program of the application B to transfer the control right to the application B. The reset handler is a small piece of code that serves as the entry point for the software application and runs on reset. In this case, the reset is simulated by executing a branch, which is equivalent to a function call. There are two major problems with this approach:
Many embedded software applications use a real-time operating system (RTOS), which allows the software to be split into multiple concurrent tasks, each of which has different responsibilities in the system. For example, the application shown in Figure 1 may have an RTOS task for reading sensors, an RTOS task for running certain algorithms on sensor data, and an RTOS task for interfacing with a radio. The RTOS itself is always active and is responsible for switching these tasks based on asynchronous events or specific time-based delays. Therefore, it is not safe to branch from an RTOS task to a new program, because other tasks will continue to run in the background. For real-time operating systems, the only safe way to terminate a program is through reset.
Figure 3. Example of memory map and boot flow using SSBL
Figure 4. Example of memory map and boot flow without SSBL
Based on Figure 4, the solution to the above problem is to make the main boot loader branch to application B instead of application A. But on some microcontrollers, the main boot loader always runs a program with an interrupt vector table (IVT); IVT is a key part of the application, describing the interrupt handling function, located at address 0. This means that the IVT must be relocated in some form so that its reset is mapped to application B. If a power failure occurs during IVT relocation, the system may be permanently damaged.
Fixing SSBL at address 0 can solve these problems, as shown in Figure 3. SSBL is not an RTOS program, so it can safely branch to a new application. The IVT of the SSBL at address 0 will never be relocated, so there is no need to worry about a power outage and restart will put the system in a catastrophic state.
Design trade-offs: the role of SSBL
We spent a lot of time discussing SSBL and its relationship with application software, but what is the role of the SSBL program? At the very least, the program must determine what the current application is (its starting location), and then branch to that address. The location of various applications in the microcontroller memory is generally stored in a table of contents (ToC), as shown in Figure 3. This is a shared area in persistent memory that both SSBL and application software use to communicate with each other. When the OTA update process is complete, the new application information will update the ToC. Some parts of the OTA update function can also be pushed to SSBL. When developing OTA update software, determining which parts to push is an important design decision. The minimum SSBL mentioned above will be very simple, easy to verify, and probably does not need to be modified during the life cycle of the application. However, this means that each application is responsible for downloading and verifying the next application. This may result in duplication of code for the radio stack, device firmware, and OTA update software. On the other hand, we can choose to push the entire OTA update process to SSBL. In this case, the application only needs to set a flag in the ToC to request an update, and then perform a reset. SSBL then executes the download sequence and verification process. This will minimize code duplication and simplify application-specific software. However, this will introduce a new challenge, that is, it may be necessary to update the SSBL itself (ie update the updated code). Ultimately, deciding which functions to place in the SSBL will depend on the memory limitations of the client device, the similarity between downloaded applications, and the portability of OTA update software.
Design trade-offs: caching and compression
Another key design decision in the OTA update software is how to organize the incoming applications in the memory during the OTA update process. There are generally two types of memory on a microcontroller: non-volatile memory (such as flash memory) and volatile memory (such as SRAM). Flash memory is used to store application program code and read-only data, as well as other system-level data, such as ToC and event logs. SRAM is used to store modifiable parts of software applications, such as non-constant global variables and stacks. The software application binary file shown in Figure 2 only contains some parts of the program that exists in the non-volatile memory. During the startup routine, the application will initialize the part of the volatile memory.
During the OTA update process, each time the client device receives a data packet containing a part of the binary file from the server, it will be stored in the SRAM. The data packet can be compressed or uncompressed. The advantage of compressing application binary files is that the files will become smaller, so the data packets to be sent will be reduced, and the SRAM space required to store the data packets during the download process will be reduced accordingly. The disadvantage of this method is that compression and decompression will increase the processing time of the update process, and the compression related code must be bundled in the OTA update software.
The new application software belongs to the flash memory, but arrives at the SRAM during the update process, so the OTA update software needs to perform a write operation to the flash memory at some point in the update process. The operation of temporarily storing a new application in SRAM is called caching. In summary, OTA update software can take three different caching methods.
No cache: Each time a data packet containing a part of a new application arrives, it is written to the target location in flash memory. This solution is very simple and can minimize the amount of logic in the OTA update software, but requires the complete erasure of the flash memory area corresponding to the new application. This method consumes flash memory and increases overhead.
Partial cache: Reserve an SRAM area for cache, and store it in this area when a new packet arrives. When the area is full, write data to the flash memory to clear the area. If data packets arrive out of order or there are gaps in the new application binary file, this scheme may become very complicated, because a method is needed to map the SRAM address to the flash memory address. One strategy is to let the cache act as a mirror image of part of the flash memory. Flash memory is divided into several small areas called pages, which are the smallest areas that can be erased. Thanks to this natural division, a good way is to cache a page of the flash memory in the SRAM, and when it fills up or the next data packet belongs to another page, the page is written to the flash memory to clear the cache.
Full cache: Store the entire new application in SRAM during the OTA update process, and write it into the flash memory only after the new application is completely downloaded from the server. This method overcomes the shortcomings of the aforementioned method, the number of times of writing to the flash memory is the least, and the OTA update software does not require complex cache logic. However, this will limit the size of the new applications downloaded, because the amount of available SRAM in the system is usually much smaller than the amount of available flash memory.
Figure 5. Using SRAM to cache a page of flash memory
Figure 5 shows the second scheme in the OTA update process-partial cache. The part of the flash memory corresponding to application A from Figure 3 and Figure 4 is enlarged, and the functional memory map of the SRAM for SSBL is shown. The sample flash page size is 2 kB. Ultimately, this design decision will depend on the size of the new application and the complexity allowed by the OTA update software.
Security and communication
Design trade-offs: software and protocol
OTA update solutions must also address security and communication issues. As shown in Figure 1, many systems implement communication protocols in hardware and software to support common (non-OTA update related) operations of the system, such as exchanging sensor data. This means that a (possibly secure) method of wireless communication has been established between the server and the client.The communication protocol that can be used by the embedded system similar to that shown in Figure 1 is Bluetooth Low Energy® (BLE) or 6LoWPAN, etc. Sometimes, these protocols support security and data exchange, and OTA update software can be used during the OTA update process.
The amount of communication functions that must be built in OTA update software will ultimately depend on the degree of abstraction provided by existing communication protocols. The existing communication protocol has a tool for sending and receiving files between the server and the client, and the OTA update software can simply use the tool for the download process. However, if the communication protocol is relatively primitive and only has tools to send the original data, then the OTA update software may need to perform subcontracting processing and provide metadata and new application binary files. This also applies to security challenges. If the communication protocol is not supported, the OTA update software may be responsible for decrypting the bytes sent by wireless confidentiality.
In short, what functions are implemented in the OTA update software, such as custom data packet structure, server/client synchronization, encryption and key exchange, etc., will depend on what the system’s communication protocol provides and the safety and robustness of the system. Require. The next section will propose a complete security solution that solves all the previously introduced challenges. We will show how to use the microcontroller’s cryptographic hardware peripherals in this solution.
Solve security challenges
Our security solution requires the new application to be sent wirelessly and confidentially, to detect any damage in the new application, and to verify that the new application is sent from a trusted server and not from a malicious party. These challenges can be solved through encryption operations. Specifically, the security solution can use two encryption operations: encryption and hashing. Encryption uses a key (password) shared by the client and server to obfuscate the data sent wirelessly. The specific encryption type that the microcontroller’s encryption hardware accelerator may support is AES-128 or AES-256, depending on the key size. In addition to encrypting the data, the server can also send a digest to ensure that there is no damage. The digest is generated by hashing the data packet, which is an irreversible mathematical function used to generate a unique code. After the server generates a message or digest, if any part of it is modified, such as a bit flipped during wireless communication, the client will notice this modification when performing the same hash function processing on the packet and comparing the digest . The specific hash processing type that the microcontroller’s cryptographic hardware accelerator may support is SHA-256. Figure 6 shows a block diagram of the encryption hardware peripherals in the microcontroller. The OTA update software resides in the Cortex-M4 application layer. This figure also shows that it supports storing protected keys in peripherals, which can be used by OTA update software solutions to securely store client keys.
Figure 6. Hardware block diagram of the crypto accelerator on ADuCM4050
A common technique to solve the ultimate challenge of authentication is the use of asymmetric encryption. For this operation, the server generates a public key-private key pair. Only the server knows the private key, and the client knows the public key. The server can use the private key to generate a signature for a given data block, such as a digest of a data packet to be sent wirelessly. The signature is sent to the client, which can use the public key to verify the signature. In this way, the client can confirm that the message was sent from the server and not from a malicious third party. This sequence is shown in Figure 7. The solid arrows indicate function input/output, and the dotted arrows indicate wirelessly sent information.
Figure 7. Use asymmetric encryption to verify the message
Most microcontrollers do not have hardware accelerators for performing these asymmetric encryption operations, but they can be implemented using software libraries such as Micro-ECC specifically for resource-constrained devices. The library requires a user-defined random number generation function, which can be implemented using a true random number generator hardware peripheral on the microcontroller. Although these asymmetric encryption operations solve the trust challenge during the OTA update, they consume a lot of processing time and require the signature to be sent with the data, which increases the size of the data packet. We can perform this check once at the end of the download using the summary of the last packet or the summary of the entire new software application, but in this case, the third party will be able to download untrusted software to the client, which is not ideal. Ideally, we want to verify that every data packet we receive comes from a server we trust, without the overhead of having to sign every time. This can be achieved using hash chains.
The hash chain integrates the cryptographic concepts discussed in this section into a series of data packets to connect them mathematically. As shown in Figure 8, the first data packet (number 0) contains a summary of the next data packet. The payload of the first packet is not the actual software application data, but the signature. The payload of the second packet (number 1) contains a part of the binary file and the digest of the third packet (number 2). The client verifies the signature in the first packet and caches the digest H0 for later use. When the second packet arrives, the client hashes the payload and compares it with H0. If they match, the client can determine that the subsequent data packet comes from a trusted server without the need for laborious signature checks. The high-overhead task of generating this chain is left to the server to complete. The client only needs to cache and hash each data packet when it arrives to ensure that the arriving data packet is intact and authentic.
Figure 8. Applying a hash chain to a sequence of packets
The ultra-low power microcontrollers that solve the memory, communication, and security design challenges described in this article are the ADuCM3029 and ADuCM4050. These microcontrollers contain the hardware peripherals discussed in this article for OTA updates, such as flash memory, SRAM, cryptographic accelerators, and real Random number generator. The device family package (DFP) of these microcontrollers provides software support for building OTA update solutions on these devices. DFP includes peripheral drivers to provide a simple and flexible interface for using hardware.
In order to verify the concepts discussed in this article, we used ADuCM4050 to create an OTA update software reference design. For the client, an ADuCM4050 EZ-KIT®Use the horseshoe connector on the transceiver daughter board to connect to the ADF7242. The client device is shown on the left side of Figure 9. For the server, we developed a Python application that runs on a Windows PC. The Python application communicates with another ADuCM4050 EZ-KIT through the serial port, which also connects to an ADF7242 in the same configuration as the client. However, the EZ-KIT on the right in Figure 9 does not perform OTA update logic, but relays the data packets received from the ADF7242 to the Python application.
Figure 9. Experimental hardware settings
The software reference design partitions the flash memory of the client device, as shown in Figure 3. The main client application has very good portability and configurability, so that other programs or other hardware platforms can also be used. Figure 10 shows the software architecture of the client device. Please note that although we sometimes refer to the entire application as SSBL, in Figure 10, and from now on, we logically separate the real SSBL part (blue) from the OTA update part (red) because of the latter It does not necessarily need to be fully implemented in the above application. The hardware abstraction layer shown in Figure 10 makes the OTA client software portable and independent of any underlying libraries (shown in orange).
Figure 10. Client software architecture
The software application implements the boot sequence (a simple communication protocol used to download new applications from the server) and the hash chain in Figure 3. Each data packet in the communication protocol has a 12-byte metadata header, a 64-byte payload, and a 32-byte digest. In addition, it has the following features:
Cache: According to user configuration, one page of non-cache or cache flash is supported.
Directory: ToC is designed to accommodate only two applications, and new applications are always downloaded to the oldest location to keep a backup application. This is called the A/B update scheme.
Messaging: Support ADF7242 or UART for messaging, depending on user configuration. Using UART for messaging can eliminate the EZ-KIT on the left of Figure 9, and only reserve the right kit for the client. This wired update scheme is useful for initial system startup and debugging.
In addition to meeting the functional requirements and passing various tests, the performance of the software is also very important for judging the success of the project. Two indicators are usually used to measure the performance of embedded software: the space occupied and the number of cycles. Occupied space refers to the amount of space occupied by software applications in volatile (SRAM) and non-volatile (flash) memory. The number of cycles refers to the number of microprocessor clock cycles used by the software to perform a specific task. It is similar to the software running time, but when performing an OTA update, the software may enter a low power consumption mode. At this time, the microprocessor is in an inactive state and does not consume any cycles. Although software reference designs are not optimized for any one index, they are very useful for program benchmarking and comparing design trade-offs.
Figure 11 and Figure 12 show the footprint (not cached) of the OTA update software reference design implemented on the ADuCM4050. These figures are divided according to the components shown in Figure 10. As shown in Figure 11, the entire application uses approximately 15 kB of flash memory. Given that the ADuCM4050 contains 512 kB flash memory, this footprint is very small. The real application software (software developed for the OTA update process) only needs about 1.5 kB, and the rest is used for libraries such as DFP, Micro-ECC and ADF7242 stack. These results help illustrate the design trade-offs of what role SSBL should play in the system. Most of the 15 kB occupied space is used for the update process. SSBL itself only occupies about 500 bytes of space, and there are 1 kB to 2 kB DFP codes for accessing devices such as flash drives.
Figure 11. Flash memory footprint (bytes)
Figure 12. Space occupied by SRAM (bytes)
In order to evaluate the software overhead, we count the cycles each time a data packet is received, and then calculate the average number of cycles consumed by each data packet. Each packet requires AES-128 decryption, SHA-256 hashing, flash memory writing, and some kind of packet metadata verification. When the data packet payload is 64 bytes and it is not buffered, the overhead for processing a single data packet is 7409 cycles. When using the 26 MHz core clock, approximately 285 microseconds of processing time is required. This value is calculated using the cycle count driver in the ADuCM4050 DFP (unadjusted number of cycles), and is the average value during the 100 kB binary file download period (about 1500 data packets). In order to minimize the overhead of each data packet, the driver in the DFP should use the direct memory access (DMA) hardware peripherals on the ADuCM4050 to perform bus transactions, and the driver should place the processor in low power during each transaction. Consume sleep state. There is no one-size-fits-all state in each transaction. If we disable low-power sleep in DFP and change the bus transaction to not use DMA, the overhead of each data packet will increase to 17,297 cycles. This shows that the efficient use of device drivers has an impact on embedded software applications. Although reducing the number of data bytes per packet can also reduce overhead, when the number of data bytes per packet doubles to 128, the number of cycles only increases slightly. The number of cycles obtained in the same experiment is 8362.
The number of cycles and space occupied also explain the trade-off discussed earlier-caching packet data instead of writing to flash memory every time. After enabling the cache of one page of flash memory, the overhead of each data packet is reduced from 7409 to 5904 cycles. This 20% reduction comes from the fact that flash writes for most packets are skipped during the update process, and flash writes are performed only when the cache is full. The price is an increase in the area occupied by the SRAM. When the cache is not used, HAL only requires 336 bytes of SRAM, as shown in Figure 12. However, when the cache is used, a space equivalent to a full page of flash memory must be reserved, so the SRAM occupancy increases to 2388 bytes. The flash memory used by the HAL will also increase slightly, because additional code is needed to determine when the cache must be cleared.
These results prove that design decisions have a real impact on software performance. There is no one-size-fits-all solution. Each system has different requirements and constraints. OTA update software needs to be treated according to specific conditions. Hope this article clarifies the common problems and trade-offs encountered when designing, implementing, and verifying OTA update software solutions.
The Links: EP4CE75F23I7N AA104SG01