Building uClinux for STM32F7 Discovery board

The STM32F7 is a Cortex-M7 microcontroller provided by STMicroelectronics. The evaluation board is called “Discovery board” and it’s equipped with several different peripherals. It’s shipped with 16Mbytes of RAM but unfortunately only 8Mbytes is actually accessible. The reason is that STM selected a 32bit RAM but on the evaluation board only 16 pins are actually connected: so you can address the whole RAM but you can use only half of it 🙂

So today challenge is to boot Linux on a system with only 8Mbytes of RAM. To make this possible I started from the BSP provided by Emcraft Systems. You can download the BSP fro their web site (https://www.emcraft.com/products/503#software). They have some binaries available, but we want to build our own Linux, and we want to run the new 4.2 Kernel. Unfortunately Emcraft doesn’t provided the new Kernel 4.2, so it’s up to us to build it.

As you probably already know, to boot the system we need:

  • U-boot
  • Kernel (we choose 4.2 version)
  • Device Tree
  • Filesystem
  • Toolchain

We are going to use u-boot and kernel from the Emcraft BSP (compiled with the GNU toolchain for uClinux Cortex-M3/M4), while we are going to create our own filesystem image and toolchain starting from the Buildroot project. The Buildroot project is really great and the guys who work there are even more great (take a look at the site https://buildroot.org).

I’m not going to focus on how you can configure the BSP of Emcraft and how you can flash the bootloader on the board because you can find all the information you need on their web site. For the laziest ones, here below useful links:

All the files you need to build the system can be downloaded from here:

https://drive.google.com/drive/folders/0B_WsUsojgmE6Mld3eHpaX1lyY0k?usp=sharing

Build U-boot

To build the u-boot you can access the folder of BSP:

~/linux-cortexm-2.1.1

and setting the environment

. ./ACTIVATE.sh

Then you need to change the file:

u-boot/include/configs/stm32f746-discovery.h

with the one in the archive file (STM32F7_lorenx.tar.gz). Then you can run the build:

make clean
make distclean
make stm32f746-discovery_config
make

The u-boot is now ready to be flashed on the board.

Build Kernel 4.2 and Device Tree

In the archive file (STM32F7_lorenx.tar.gz) you can find the right kernel configuration (.config) and the devicetree files. Replace and add the files in the linux folder and build the kernel:

cd linux
make ARCH=arm CROSS_COMPILE=arm-uclinuxeabi- uImage
make ARCH=arm CROSS_COMPILE=arm-uclinuxeabi- dtbs
mkimage -A arm -O linux -T multi -a 0xC0008000 -e 0xC0008001 -C none -d arch/arm/boot/Image:arch/arm/boot/dts/stm32f746-lorenx.dtb uImageMulti

The kernel will be lately loaded by the u-boot through tftp. The setup of a tftp server and the boot via tftp is out of the scope of this post.

Build Buildroot (2016.08.1)

The final step is to build the root fileystem which will be lately flashed on a SD card. The configuration (.config) can be found in the archive file (STM32F7_lorenx.tar.gz).

The filesystem has a very minimal configuration and its footprint is about 2Mbytes. The filesystem can also be customised using the “menuconfig” command provided by buildroot.

My configuration will also build a toolchain which can be used to build applications that can be loaded on the system. The toolchain provides build utilities and also the most important libraries like POSIX threads library. The libpthread is not available in the Cortex-M toolchain downloaded from the Emcraft website, but it is a very important library which is widely used in many applications. The toolchain can be found in the “output/host/usr” folder, and can be customised with any suffix: I have chosen the “inxpect” suffix which is the name of the company I’m currently woking for (great project, take a look http://www.inxpect.com).

After run “make” command you can find the filesystem in the “output/images” folder. Pick the “.cpio” file and decompress it on your SD card. I’ve used a SD card formatted with FAT filesystem: if you want to use the Ext3/4 filesystem remember to add its support in the kernel. My recommendation is to use Ext3/4 filesystem because the files links are preserved: pay attention if you use SD because several links can be broken! You need to replace some files which link to busybox to boot the system with the FAT filesystem: init, mount, sh, login, getty, mkdir, …

Boot the system

The system can be now started. Flash the u-boot on the board, insert the SD card, connect the network, connect your PC to the serial port and power up. Stop the u-boot and configure the environment:

setenv bootargs 'stm32_platform=stm32f7-disco console=ttyS5,115200 root=/dev/mmcblk0p2 rootwait'
tftp stm32f7/uImage
bootm

The “root” argument depends on how your SD is formatted: I’m using the partition number 2 of the SD card.

While running the system you can experience several OOM (Out Of Memory) errors: this is because the memory is very limited and you don’t have swap (no MMU no swap).

Unfortunately with this configuration we don’t have enough space to run seriously a Linux system: I think that Linux needs at least 16Mbytes of RAM to run properly with this configuration. Maybe with a small upgrade of the RAM things will start to get better…

That’s all…

Posted in Senza categoria | 14 Comments

Change Linux CPU default scheduler

Linux is a fair system, so it is happy to make the same CPU time available to any process. Actually the default Linux CPU scheduler is CFS: completely fair scheduler.

In embedded applications it is often required to run processes with a different priority: we have tasks which have a higher priority respect to other tasks which can be run when the CPU is not busy with critical stuff. That’s why Linux provides other schedulers like round robin (RR) and FIFO that are classified as real-time scheduler.

What happens when you mix some processes using RT scheduler and other processes using CFS? The behaviour can be very strange. In my personal experience I had RT tasks stuck for a long time because the processes which belonged to the CFS domain scheduler hadn’t run for a long time, and they had a lot of unused time slice. This led to unexpected behaviour and missing deadline of critical tasks.

To overcome this problem I had the idea to use the same scheduler domain for all the processes in the system, starting from the init down to all its children. What I needed is to change Linux kernel to made it possible.

The hack seems hard but it is very easy. Basically the only functions which have to be changed is the sched_fork and the _sched_setscheduler functions in the core file of Linux scheduler.

The hack basically makes RR the default scheduler for all created processes, modifying the behaviour of the fork function. It also changes the behaviour of the “set scheduler” function to force the RR scheduler.

Modifications to function sched_fork on line 1926 of Kernel 3.18 (Xilinx 3.18 Kernel) and to function _sched_setscheduler on line 3642 of same Kernel below:

-     p->prio = current->normal_prio;
+     /* Lorenzo Nava: force policy to RR */
+     if (p->policy == SCHED_NORMAL) {
+         p->prio = current->normal_prio - NICE_WIDTH -
+                 PRIO_TO_NICE(current->static_prio);
+         p->normal_prio = p->prio;
+         p->rt_priority = p->prio;
+         p->policy = SCHED_RR;
+         p->static_prio = NICE_TO_PRIO(0);
+     }

+     /* Lorenzo Nava: force policy of process to RR */
+     if (attr.sched_policy == SCHED_NORMAL) {
+         attr.sched_priority = param->sched_priority -
+                 NICE_WIDTH - attr.sched_nice;
+         attr.sched_policy = SCHED_RR;
+     }
+

Now all of your process uses the RR scheduler as default scheduler. The initial priority is proportional to the one used with default standard scheduler.

With this type of scheduling you can preempt low priority tasks with higher priority ones. The priority and the time slice can be modified using Linux standard ioctls.

That’s it…

Posted in Senza categoria | 1 Comment

Contiguous memory on ARM and cache coherency

In my working experience I faced the problem of managing a large chunk of memory used for DMA transfers from an external device to my ARM CPU. The device didn’t implement the scatter/gather functionality, so I had to reserve a large chunk of memory that I used as a circular buffer. This circular buffer is then mapped into user space memory using mmap and accessed by the user. My main problem is that data coming from the device should be manipulated by the user who wants to access this data using the cache, in order to have higher performance during the processing. What I had to do is:

  • allocate a large portion of memory (200 Mbytes) used for DMA transfers from device
  • make the memory available to the user without copies
  • make the memory cacheable for the user (manage cache coherency)

The first problem can be overcome using the CMA (contiguous memory allocator) with a little hack which will allow us to reserve cacheable memory. CMA has the advantage that it will reserve only the memory actually used for DMA, so we won’t subtract useful memory space to the system.

The main issue about CMA is that the pages reserved are marked as non-cacheable, and unfortunately we must use this memory for processing purpose. My choice was to define a new function my_dma_alloc_coherent, which is similar to the standard dma_alloc_coherent function, but uses different page protection attributes. The dma_alloc_coherent reserves the so called “device” memory on ARM architectures, which is defined as bufferable and non-cacheable memory. What I need is “normal” memory, which is bufferable and cacheable memory: this type of memory is the one which is returned using, for example, the kmalloc function. Of course this approach needs that coherency must be properly managed, but we talk about this later.

A different solution is that you simply use the dma_alloc_coherent function and map the user memory as cacheable: this is discouraged in ARMv7 systems. Reference manual stats that having physical address mapped to pages having different attributes may lead to undefined behaviour. Actually, if you don’t use the kernel virtual address to access data, I don’t see any problem in using this approach.

So the only hack required is to modify the arm_dma_alloc function defining page protection flags as, for example:

pgprot_t prot = __pgprot_modify(prot, L_PTE_MT_MASK,
        L_PTE_MT_WRITEALLOC | L_PTE_XN)

Next step is to map the memory to userspace: this can be easily achieved defining the mmap function for our device this way:

start = (unsigned long) vma->vm_start;
size = (unsigned long) vma->vm_end - vma->vm_start;
page = (unsigned long) my_device->dma_phys;
ret = remap_pfn_range(vma, start, page >> PAGE_SHIFT, 
        size, vma->vm_page_prot);

I hadn’t use the dma_mmap_coherent because, once again, it maps the memory as non-cacheable, and this is something I was trying to avoid.

Now the only thing that I had to guarantee is that cache remains coherent across DMA transfers. This was easily achieved using the dma_sync_single_for_device and dma_sync_single_for_cpu. These functions makes the proper clean and invalidate operations on the cache in order to guarantee updated data during read process from device or CPU. These functions must be called respectively any time:

  • user modifies data and wants to send them to device
  • a DMA transfer is completed, so data in RAM has been modified

There’s no need to perform these operations when data is manipulated only by the user, which can write and read it without any cache coherency problem.

Well I think that’s all for now.

Posted in Senza categoria | 22 Comments

Use SHT71sensor with Beaglebone

Let’s start with something soft like interfacing SHT71 temperature and humidity sensor with a Beaglebone device.

It may appear to be quite easy to do that, but there are a couple of tricky steps to follow to make sure all the stuffs just work.

First of all let’s configure the hardware layout. To make SHT71 sensor work correctly (datasheet here) we need to connect the 4 pins:

  • VDD (we can choose between 3.3V or 5V)
  • GND
  • DATA
  • SCK

The first choice to make is: what power supply voltage do I need to use? Both 3.3V and 5V are ok, but we need to know that the 3.3V allows a maximum clock frequency of 1MHz (according to page 5 of datasheet). So if we go with 3.3V we need to make a simple hack to SHT71 linux driver.

We also need to choose a couple of GPIOs which we can use for DATA and SCK lines. I decided to use GPIO3_19 (DATA) and GPIO3_21 (SCK) which looks to be unused on standard configuration. Take care: you need to pull-up DATA line (a 10K resistor should be fine) to make SHT71 sensor work correctly.

The hardware configuration is now ok. Let’s start with software integration.

The first step is to check that the driver SHT15 is enabled in the kernel configuration of you Beaglebone. You can find the driver is in the “Device Drivers -> Hardware Monitoring support” section of your kernel configuration. My personal opinion is to NOT select it as a module, which I consider a general good rule to follow when dealing with embedded device.

Next step is to patch your driver if you choose to use the 3.3V power supply voltage. The hack is really simple, you just need to edit the driver “drivers/hwmon/sht15.c” and check these defines:

#define SHT15_TSCKL	100	/* (nsecs) clock low */
#define SHT15_TSCKH	100	/* (nsecs) clock high */
#define SHT15_TSU	150	/* (nsecs) data setup time */

They are set accordingly to a setup which uses a 10MHz clock. The setup should be fine if you use a 5V power supply. If you’ve decided to use a 3.3V power supply, you must change these values to make them compatible with a 1MHz clock. Just multiply the values by ten and that’s it. If you want you can slow down the frequency to 100KHz or even less (maybe you need it if you have a poor oscilloscope :-))

Now we have to make the system able to load the driver and to use the SHT71 sensor. Unfortunately the driver is not very updated and it doesn’t work with new device tree structure. There are two solutions:

  1. make the driver compatible with device tree structure
  2. register the platform driver in the board init function

The first solution id definitely the best one, anyway I’ve chosen to use the second one because it is faster to implement. I hope I’ll be able to explore the first solution in next weeks: device tree approach needs some fix in the driver to gain access to configuration parameters.

To register the device we need to edit the file “arch/arm/mach-omap2/board-generic.c” and add these definitions:

#include <linux/platform_device.h>
#include <linux/platform_data/sht15.h>
...

static struct sht15_platform_data beagle_sht15 = {
    .gpio_data = 115,
    .gpio_sck = 117,
    .supply_mv = 3300,
    .checksum = false,
};

static struct platform_device sht15_device = {
    .name = "sht71",
    .id = 0,
    .dev = {
        .platform_data = &beagle_sht15,
    },
};

static void __init omap_generic_init(void)
{
...
    platform_device_register(&sht15_device);
...
}

We’ve defined the “beagle_sht15” device which will work at 3.3V and won’t have checksum enable. GPIOs numbers are obtained with this simple computation:

  • GPIO3_19 number is: 3 * 32 + 19 = 115
  • GPIO3_21 number is: 3 * 32 + 21 = 117

In general the GPION_M corresponding number is obtained by doing:

  • N * 32 + M

Now we’re ready to use the SHT71 sensor with our Beaglebone. We just have to recompile the kernel and boot the system. The information about temperature and humidity can be found by typing:

cat /sys/class/hwmon/hwmonX/device/temp1_input
cat /sys/class/hwmon/hwmonX/device/humidity1_input

And that’s it.

Posted in DIY | Leave a comment

Boot in progress…

I’ve created this blog to share my experience with Linux embedded systems. I think that this blog can be a good place to discuss with embedded engineers out there. Hopefully I will be able to help some of them with their work and certainly I will have the chance to learn something new and improve my knowledge.

I’ll be back soon with some posts…

Posted in Senza categoria | Leave a comment