Infineon Aurix Project 2020

The main aim of the project is to perform machine learning on the AURIX TC297 TFT board (TC29x B-Step MCU). Relying on the board features and its memory and computation constraints, a study has been done to understand which machine learning regression models could be adapted and run on the device.

The essential board components for this purpose are Ethernet, display, and multicore execution. The board communicates with a Python client installed in the host PC through the Ethernet connection, given its reliability and flexibility. The display is the simplest interface to give a quick view of the available data, without waiting for a download and post-processing of the results. The multicore execution is used to increase the computation performance.

Through the Ethernet connection it is possible to send data and receive the predictions from the models implemented within the board. Communication takes place through the classic TCP/IP protocol, where the board acts as a server and responds to connection requests from clients (which in the case study are installed in the Host PC, but can be installed on other boards as well). The scatter-plot of the predictions, which is typically used to show the results of regression models, is also plotted on the board screen during the data acquisition.

For further details you can read the following articles:

Display: http://cas.polito.it/IFX-AURIX@PoliTo-University/?p=380

Ethernet: http://cas.polito.it/IFX-AURIX@PoliTo-University/?p=381

ML: http://cas.polito.it/IFX-AURIX@PoliTo-University/?p=371

Multi-core: http://cas.polito.it/IFX-AURIX@PoliTo-University/?p=379

Multicore execution

This section is part of the Infineon Aurix Project 2020

One of the main goals of the project is the execution of programs running on the three processor cores on the board. In the problem in hand, this feature can let us run machine learning predictions in parallel, thus speeding up the processing of a huge amount of data if needed. Moreover, this can let us parallelize the execution of some computational-intensive machine learning models and other activities. Thus, we implemented the producer/consumer pattern by using Core 0 to get data from outside, through the Ethernet connection, while Core 1 and Core 2 were employed to process them.

Core 0 deals with the acquisition of data from the Ethernet connection and the packaging within a data structure with some other scheduling information, such as the machine learning algorithm to apply and the client ID; this allows us to manage multi-client connections and multi-algorithm predictions. The new data structure is inserted into a common buffer, using appropriate enqueue and dequeue functions, and Core 1 and Core 2 are waked up. Another task of Core 0 is to collect the results of the computation and send them back to the client through the Ethernet connection.

Core 1 and Core 2 work in the same way, according to the function received by Core 0. If no data is ready in the common buffer, the two cores put themselves in idle mode and they are waked up again by Core 0 when new predictions are needed. Once awakened, each core consumes an element of the buffer and executes the prediction model, according to the values in the data structure. Upon completion of these operations, results are produced and saved into a second buffer for Core 0, which progressively sends them back to the client. Concurrently, results are also printed on the screen for a quick inspection.

Multi-core is also useful to manage multi-client connections. Since the board can accept multiple connections at a time, it is important to correctly manage the incoming data with the two cores and avoid conflicts. As mentioned above, a client ID is used to keep track of the different connections. In the following video, it is possible to see how two clients make a request of connection to the server and, after the connection is established, how data are sent to be processed.


The final aspect to be examined is concurrent execution. Due to the presence of the two common buffers, the first containing the data structures related to the external inputs and the second filled with the results, the access must be controlled to prevent multiple cores from accessing the same data at the same time. For this purpose, two locks are implemented, one for each buffer, taking care to release them correctly to avoid deadlocks.