Within the SAFEXPLAIN (SE) project, project partner, Ikerlan, leads the railway case study (CS), which is specifically centred on Automatic Train Operation (ATO). This article highlights how this CS is integrated into the reference safety architecture, building on the project’s foundational work and referencing prior articles on implementing safety functions and test activities.
Integration into ROS2
The case study was first integrated into ROS2 (Robot Operating System 2) middleware, generating the nodes needed for each feature. This first architecture is shown in Figure 1, where the structure of each node and the interaction between nodes is shown.
The initial architecture contains four nodes: the video player, the object and track detection, the stereo depth estimation and the decision function. This whole pipeline is used to detect potential obstacles in the railway, and depending on the distance to the train, different alerts are given to the driver.
This set of nodes do not have any mechanisms to handle situations where failures happen, as the nodes do not have any information about the other nodes, the status of each of them is totally independent, but they are fully dependent at the same time.
To overcome this issue, a middleware has been defined, which will focus on adding a skeleton for diagnostic and monitoring mechanisms.
Integration into SAFEXPLAIN middleware
One of the project contributions focuses on defining the middleware, based on ROS2, which serves as the backbone for the case studies. This middleware will provide the skeleton for the reference safety architecture pattern, ensuring the CSs that are integrated into it follow this pattern by design. As shown in Figure 2, the railway CS has been integrated into the middleware, enabling compliance with the reference safety architecture.
For the integration process, a mapping has been performed, from the nodes and interactions proposed for the CS’s initial ROS2 architecture, to the nodes and interactions needed in the reference architecture itself, which is shown in the Figure 3. This transformation can be found in Figure 4, which embeds the railway CS integrated in the SE middleware, using the reference architecture.
The integration process has multiple steps. This article focuses on the AI-based subsystem part, leaving the other component and nodes aside. This is the most important set of nodes, as the CS is AI-based.
This subsystem consists of multiple nodes, which interact among themselves:
- AI/ML constituent node: Responsible for AI processes, this node processes images and detects potential obstacles, calculating their distances. This information is sent to the decision function. Integrated into the middleware, it publishes the model’s health status and the preprocessing and post-processing statuses. Redundancy is crucial here; in the railway CS, two different models, YoloV8 and SafeYolo, are used. This diverse redundancy helps to diagnose and monitor potential model faults.
- L1 Diagnostic and Monitoring Mechanism node: This node diagnoses the AI subsystem and the platform resources it uses. Due to redundancy, it is deployed twice, once per AI/ML constituent. It processes multiple inputs: images used by the AI/ML constituent, the status of each AI processing step, and the constituent’s output. Techniques like comparing left and right images for consistency, each image’s temporal consistency and monitoring system resources ensure accurate diagnosis and monitoring.
- Supervision function node: This node ensures the AI system operates within a predefined safety envelope, for example, verifying input images fall within the model’s Operational Design Domain (ODD). This node is also redounded, with each one supervising an AI/ML constituent.
- Decision function node: Aggregates multiple inputs to provide informed suggestions or decisions to the user and the safety control module. It integrates outputs from the AI/ML constituents, health statuses from the L1 modules, and supervision results. For example, in the railway CS, it alerts the train driver about potential obstacles with varying degrees of urgency.
Conclusions
The integration of the railway case study into the reference safety architecture pattern has significantly enhanced the safety and reliability of Automatic Train Operation (ATO) systems. By embedding the case study into the ROS2 middleware and the reference architecture, the project has implemented robust diagnostic and monitoring mechanisms, ensuring continuous surveillance and fault detection. The use of diverse AI models and comprehensive health management strategies has fortified the system against potential failures, enabling accurate and timely decision-making.