In order to use a single codebase to control three different (but familial) microcontrollers, we need to examine the accessible hardware features of the chips and figure out the overlapping compatibility between them.

comparison nordic development boards 1a

(For more information about what we’re trying to achieve here, take a look at the Introduction.)

This doesn’t mean that we are always forced to use the lowest common set of hardware features, but we need to be careful, and recognize that the software performance will definitely vary across the three microcontrollers. It is a smart idea to form a rough plan around the available resources first.

Part of the challenge will be to extract as much performance as possible from the available hardware and peripherals. Timing, optimization, and benchmarking will figure into the overall development process.

Let’s take a look at what the chips offer in common.

Instruction Set Architecture (ISA)

Starting with the central processing unit itself and its core instruction set, we need to check that all of the processors can run binary-identical code. If so, then we can write a single, executable firmware file.

The nRF51422 uses an ARM Cortex-M0 core, whereas the nRF52832 and nRF52840 use an ARM Cortex-M4F core with Floating Point Unit (that we’re not going to use).

In fact, the ARM Cortex-M instruction set is upward-compatible between versions. Code compiled for the ARMv6-M architecture will run without modification on ARMv7-M and ARMv7E-M processors.

The Cortex-M0 / M0+ / M1 implement the ARMv6-M architecture, the Cortex-M3 implements the ARMv7-M architecture, and the Cortex-M4 / Cortex-M7 implements the ARMv7E-M architecture. The architectures are binary instruction upward compatible from ARMv6-M to ARMv7-M to ARMv7E-M.

Binary instructions available for the Cortex-M0 / Cortex-M0+ / Cortex-M1 can execute without modification on the Cortex-M3 / Cortex-M4 / Cortex-M7. Binary instructions available for the Cortex-M3 can execute without modification on the Cortex-M4 / Cortex-M7 / Cortex-M33 / Cortex-M35P.[9][10] Only Thumb-1 and Thumb-2 instruction sets are supported in Cortex-M architectures; the legacy 32-bit ARM instruction set isn’t supported.

— https://en.wikipedia.org/wiki/ARM_Cortex-M#Instruction_sets

One of the key characteristics of the ISA in the Cortex-M processors is the upward compatibility: Instruction supported in the Cortex-M3 processor is a superset of Cortex-M0/M0+/M1. So theoretically if the memory map is identical, a binary image for Cortex-M0/M0+/M1 can run directly on a Cortex-M3. The same applies to the relationship between Cortex-M4/M7 and other Cortex-M processors; instructions available on Cortex-M0/M0+/M1/M3 can run on a Cortex-M4/M7.

— ARM Cortex-M for Beginners
An overview of the ARM Cortex-M processor family and comparison

It is therefore possible to load and run the same firmware on all three chips.

Although the most optimal code may not be used on the older chips, as the newer chips have additional instructions not available to previous architectures, the code should still run, and the compiler will be able to implement workarounds using available instructions.

The exact relationship between the various versions of the Instruction Set Architecture is captured in the following diagram (also from the ARM Cortex-M for Beginners whitepaper):

arm thumb instruction set architecture overview 1

Each subsequent version is a superset of the previous instruction sets and contains all of the previous instructions. There are a number of added useful instructions that only appear in the ARM v7-M architecture: CLZ (Count Leading Zeroes), STREX (Store Exclusive) and LDREX (Load Exclusive) which are used for implementing atomic synchronization primitives, but there are workarounds for some of these instructions in the smaller ARM v6-M architecture. (Mostly involving disabling interrupts while doing things in critical sections.)

Which leads us right to the next topic.

Interrupts

The Nested Vector Interrupt Controller differs between the ARMv6-M and ARMv7-M architectures and requires a careful reading of the ARM Architecture Reference Manuals to understand the key differences.

Number Of Interrupts

The nRF51 Series chips based on ARMv6-M provide up to 32 customizable user interrupts in the non-relocatable vector table.

For all chips implementing the Cortex-M0 architecture specifically, the interrupt vector table is locked into non-volatile flash memory at address 0x00000000.

ARMv6-M provides an interrupt controller as an integral part of the exception model. The interrupt controller operation aligns with the ARM General Interrupt Controller (GIC) specification, defined for use with other architecture variants and ARMv7 profiles.

The ARMv6-M NVIC architecture supports up to 32 discrete interrupts, IRQ[31:0]. The general registers associated with the NVIC are all accessible from a block of memory in the SCS as described in Table B3-18 on page B3-245.

— ARM v6-M Architecture Reference Manual
Section B3.4: Nested Vector Interrupt Controller

The nRF52 Series chips based on ARMv7-M offer 48 customizable user interrupts in the relocatable vector table. The chips actually have room for up to 64 interrupts in the NVIC control registers, but only have 48 implemented hardware peripherals.

ARMv7-M provides an interrupt controller as an integral part of the ARMv7-M exception model. The interrupt controller operation aligns with the ARM General Interrupt Controller (GIC) specification, defined for use with other ARMv7 profiles and other architectures.

The ARMv7-M NVIC architecture supports up to 496 interrupts. The number of external interrupt lines supported can be determined from the read-only Interrupt Controller Type Register (ICTR) accessed at address 0xE000E004 in the System Control Space. See Interrupt Controller Type Register, ICTR on page B3-674 for the register detail. The general registers associated with the NVIC are all accessible from a block of memory in the System Control Space as described in Table B3-8 on page B3-682.

— ARM v7-M Architecture Reference Manual
Section B3.4: Nested Vector Interrupt Controller

Both of these numbers are important to note, and the exact functioning of the NVIC_ISER registers for each chip will become important during a later bug hunt.

Interrupt Latency

The Cortex-M0 has a 16-cycle interrupt latency, whereas the Cortex-M4 has a 12-cycle interrupt latency.

For a nRF51 Series chip running at 32 MHz, this equates to a ~0.5 µs delay before the interrupt service routine starts to execute.

For a nRF52 Series chip running at 64 MHz, this equates to a ~0.1875 µs delay before the interrupt service routine starts to execute.

It is not clear whether it makes sense to use interrupts at all, though. Context switches could occur too often and have too much overhead to compete with a single, tightly-written superloop in main().

Vector Table Offset Register (VTOR)

The VTOR hardware register is used to place the interrupt vector table into Static Random Access Memory (SRAM) instead of placing it in flash memory. This is helpful for certain use cases, but there is no clear need for this feature in this project.

The Cortex-M0 core lacks a Vector Table Offset Register.

The Cortex-M4 core implements the VTOR, but we will not use it.

CPUID

Now let’s take a look at ways that the firmware can identify the chip it is executing on.

On ARMv6-M, a single CPUID register identifies the chip, with a number of bitfields left to implementation-dependent definition:

cpuid armv6m 1
Figure 1. Source: ARM® v6-M Architecture Reference Manual

On the nRF51422, the CPUID register at address 0xE000ED00 returns 0x410CC200:

cpuid nrf51422 1a

On both the nRF52832 and nRF52840, the CPUID register at address 0xE000ED00 returns 0x410FC241:

cpuid nrf52832 1a

This means we can’t use the CPUID register by itself to identify the chip.

Also, unfortunately, the ARMv6-M architecture lacks a universal set of feature bits that can be queried using the CPUID instruction. In particular there is no well-defined method by which interface-compatible, implementation-specific peripherals can be detected and queried. This means that any software that wants to be compatible with multiple, different Cortex-M chips needs to bring with it an ahead-of-time knowledge of the peripherals available.

This does make some sense. Since most programmers aren’t going to be writing one compiled firmware object to run on multiple microcontrollers, as this project does, there’s no need to waste any silicon at all on feature bits that will never be queried.

Additionally, because the ARM architecture can be implemented by many different vendors, the unique identifying information is encoded in vendor-specific ways. For instance, Nordic Semiconductor chips have both publicly-described and undocumented hardware identifiers. Other manufacturers, such as NXP and ST Microelectronics have their own unique product identifiers.

The identifiers we can actually use to precisely identify the chip are scattered throughout the C code implementation of the Nordic nRF5 SDK version 12.3 and version 16.0 (at the time of writing), and located at arbitrary and opaque memory locations and via the FICR registers on nRF52 Series chips.

Mostly, the hardware identifiers are used to trigger workarounds for various Errata, and take the following form:

static inline bool nrfx_usbd_errata_type_52840(void)
{
    return (*(uint32_t const *)0x10000130UL == 0x8UL);
}

static inline bool nrfx_usbd_errata_type_52840_eng_a_or_later(void)
{
    return nrfx_usbd_errata_type_52840();
}

static inline bool nrfx_usbd_errata_type_52840_eng_b_or_later(void)
{
    return (nrfx_usbd_errata_type_52840() &&
           (*(uint32_t const *)0x10000134UL >= 0x1UL));
}

static inline bool nrfx_usbd_errata_type_52840_eng_c_or_later(void)
{
    return (nrfx_usbd_errata_type_52840() &&
           (*(uint32_t const *)0x10000134UL >= 0x2UL));
}

static inline bool nrfx_usbd_errata_type_52840_eng_d_or_later(void)
{
    return (nrfx_usbd_errata_type_52840() &&
           (*(uint32_t const *)0x10000134UL >= 0x3UL));
}

static inline bool nrfx_usbd_errata_type_52833(void)
{
    return (*(uint32_t const *)0x10000130UL == 0x0DUL);
}

static inline bool nrfx_usbd_errata_type_52833_eng_a_or_later(void)
{
    return nrfx_usbd_errata_type_52833();
}
— nRF5 SDK v16.0.0 98a08e2

We can use these hard-coded magic register values to understand what variant of a chip we are using.

On ARMv7-M, the CPUID mechanism received a major upgrade, with numerous feature bits describing the processor and an entire chapter dedicated to these bits.

One example is the bitfield describing whether the LDREX and STREX instructions are available as synchronization primitives in the ID_ISAR3 and ID_ISAR4 Instruction Set Attribute Registers.

cpuid armv7m 1
Figure 2. Source: ARM® v7-M Architecture Reference Manual

Although we know that some of these more advanced instructions are available, we will not be making any of use of them in this unified firmware. This is fine, since there is no plan to use a multitasking operating system with simultaneous access to shared memory locations.

CLOCK

The CLOCK peripheral base address is 0x40000000 on all three devices.

The nRF51 Series chip has a 32 MHz HFCLK driven by an external crystal, or can also use a 16 MHz RC on-chip oscillator.

nrf51 dk clocks 1
Figure 3. Source: Nordic Semiconductors PCA10028 Block Diagram

The nRF52 Series chips have a 64 MHz HFCLK driven by an external 32 MHz crystal, or can also use a 64 MHz RC on-chip oscillator.

nrf52832 clock 1
Figure 4. Source: Nordic Semiconductors PCA10040 Schematic Layout

All of the development hardware boards also have a 32.768 kHz LFCLK driven by an external crystal, for Realtime Clock (RTC) and low-power timer use.

FLASH

On the nRF51422, we load firmware into FLASH memory starting at address 0x00000000 with a total of 256 KB available storage.

On the nRF52832, we load firmware into FLASH memory starting at address 0x00000000 with a total of 512 KB available storage.

On the nRF52840 Dongle:

  1. We can use FLASH memory starting at address 0x00001000, when using a Master Boot Record to load executables:

    MEMORY
    {
        FLASH (rx) : ORIGIN = 0x1000, LENGTH = 0xdf000
        RAM (rwx) :  ORIGIN = 0x20000008, LENGTH = 0x3fff8
    }
  2. We can eliminate the Master Boot Record and use FLASH memory starting at address 0x00000000, with a total of 1 MB available storage.

We will take the second option, but this will require a physical modification to the nRF52840 Dongle to allow us to add a standard-sized Serial Wire Debug connector. We need this connection anyways to gain visibility into the running software and speed up the development process.

Flash Wait-State Penalties

Running code from flash has the following wait-state penalties on the nRF52 Series chips, where the CPU core has to briefly stall itself to allow the code to load:

Executing code from flash will have a wait state penalty on the nRF52 Series. An instruction cache can be enabled to minimize flash wait states when fetching instructions. For more information on cache, see Cache on page 30. The section Electrical specification on page 21 shows CPU performance parameters including wait states in different modes, CPU current and efficiency, and processing power and efficiency based on the CoreMark® benchmark.

— nRF52832 Product Specification
Section 7: CPU

More detailed information is provided in the nRF52832 specifications table:

nrf52832 instruction cache info 1
Figure 5. Source: nRF52832 Product Specification

We will discuss a mitigation for the wait-state speed difference when we talk about Enabling the Instruction Cache.

GPIO

The GPIO Port 0 peripheral base address is 0x50000000 on all three devices.

The GPIO Port 1 peripheral base address is 0x50000300 and only exists on the nRF52840 because it is a new addition.

comparison gpio pin cnf 1a

The PIN_CNF pin configuration registers are identical across all chips and all ports.

They really got the hardware right the first time.

GPIOTE

The GPIOTE peripheral base address is 0x40006000 on all three devices.

The nRF51422 only has 4 CONFIG registers in its GPIOTE peripheral.

The nRF52832 and nRF52840 each have 8 CONFIG registers as well as direct Set and Clear Tasks, which allow hardware-driven control of the configured pins to push them to HIGH or LOW voltage levels.

The GPIOTE peripheral is used to generate pin change interrupts and trigger the interrupt service routines using the Task and Event system. When used in conjunction with the Programmable Peripheral Interconnect, we can set up pushbutton-triggered interactions with the development kits and autonomous LED toggling.

comparison gpiote config 1a

The implementation of the GPIOTE hardware differs only slightly between the nRF51422/52832 and the nRF52840, which adds support for GPIO Port 1 to the CONFIG register.

Again, they got the hardware right the first time.

POWER

The POWER peripheral base address is 0x40000000 on all three devices.

Because the hardware will be connected to a constant source of power, there is no need to implement power-saving or low-power sleep behaviors in the firmware.

RADIO

The RADIO peripheral base address is 0x40001000 on all three devices.

The RADIO peripheral transitions between the following states across all three devices.

radio state diagram 1a
Figure 6. Source: nRF52832 Product Specification

The time required to sample the Received Signal Strength Indicator varies:

Chip RSSI sample time (µs) DISABLED → RXIDLE time (µs)

nRF51422

8.80

130

nRF52832

0.25

40

nRF52840

0.25

40

The timing differences between each of the chips will affect the maximum RSSI sampling rate. Since we plan to sample only a single carrier frequency, the longer ramp-up time penalty will only be paid once.

Note
It’s also somewhat unclear whether the 0.25 µs value for the nRF52832 is a typo or not. Why this chip is 35x - 60x faster than the other two is not fully understood.
Note
Update to the above: The RSSI sample time is in fact 0.25 µs on both of the newer nRF52 Series chips. This information was updated between the nRF52840 Product Specification 1.0 and 1.1 releases.

RAM

Chip Base Address Size (KB)

nRF51422

0x20000000

32

nRF52832

0x20000000

64

nRF52840

0x20000000

256

The nRF51422 has the least amount of RAM of the bunch, but 32 KB is actually quite a lot in microcontroller terms, so this should be readily achievable.

SysTick Timer

The nRF51422 does not have a SysTick Timer. This is surprising. The workaround for this is to use one of the TIMER peripherals.

The nRF52832 and nRF52840 do have a SysTick Timer.

Since it’s not common to all three devices, we won’t be using it.

UART(E)

The three devices implement two different types of UART, one with DMA support and one without.

Chip UART0 UART1 DMA

nRF51422

0x40002000

-

No

nRF52832

0x40002000

-

Yes

nRF52840

0x40002000

0x40028000

Yes

The nRF52840 adds a second UART, which was sorely missing from the other devices. The non-DMA UART peripheral is deprecated on the nRF52832 and nRF52840 devices.

The UART(E) peripheral offers up to 1 Mbps transfer rate on all devices, though this rate may require the use of High Drive GPIO pins to achieve the proper signal slew rate.

Hardware Errata

Last, but not least, there is always faulty hardware.

Each chip has a certain number of hardware anomalies associated with it that impact software functionality. Interrupts that are expected to work might end up getting stuffed, Direct Memory Access (DMA) may break or refuse to fire completion notifications, signal timings might be delayed, and so on.

Throughout any competent firmware development process, we need to keep an eye on the Errata lists provided by, in this case, Nordic Semiconductor. Manufacturers of other chip lines (NXP, ST Microelectronics, etc.) will always provide Errata for their products as well.

Next Steps

In the next part of this writeup, we’ll look at the Requirements, Architecture, and Tooling documents that incorporate the above information into an actionable plan.

Updated 2020-04-22 22:15:42 +0200, proudly made with Asciidoctor.