Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zephyr doesn't boot on STM32F4 (and probably other ARMv7-M cores) #71698

Closed
duda-patryk opened this issue Apr 19, 2024 · 15 comments
Closed

Zephyr doesn't boot on STM32F4 (and probably other ARMv7-M cores) #71698

duda-patryk opened this issue Apr 19, 2024 · 15 comments
Assignees
Labels
area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug

Comments

@duda-patryk
Copy link
Collaborator

Describe the bug
Zephyr can't boot on STM32F4 after #65071.

To Reproduce
Build Zephyr and run on STM32F4

Expected behavior
It boots

Impact
Very high. Zephyr doesn't work.

Logs and console output
No logs are printed.

Environment (please complete the following information):
N/A

Additional context
Reverting f11027d seems to fix the problem.

I compared assembly code and looks like commit above removed cpsie if instruction from arch_switch_to_main_thread().
For me, the following change fixes problem (ARMv7-M only!):

diff --git a/arch/arm/core/cortex_m/thread.c b/arch/arm/core/cortex_m/thread.c
index f9a030675b5..87900df9091 100644
--- a/arch/arm/core/cortex_m/thread.c
+++ b/arch/arm/core/cortex_m/thread.c
@@ -576,6 +576,7 @@ void arch_switch_to_main_thread(struct k_thread *main_thread, char *stack_ptr,
        "mov   r4,  %0\n"       /* force _main to be stored in a register */
        "msr   PSP, %1\n"       /* __set_PSP(stack_ptr) */
 
+       "cpsie if\n"            /* __enable_irq(); __enable_fault_irq() */
        "mov   r0,  #0\n"       /* arch_irq_unlock(0) */
        "ldr   r3, =arch_irq_unlock_outlined\n"
        "blx   r3\n
@duda-patryk duda-patryk added the bug The issue is a bug, or the PR is fixing a bug label Apr 19, 2024
@duda-patryk duda-patryk added the area: ARM ARM (32-bit) Architecture label Apr 19, 2024
@nordicjm
Copy link
Collaborator

Did you try with #71593 ?

@duda-patryk
Copy link
Collaborator Author

Did you try with #71593 ?

Yes

I realized that it fails to boot on warm reboots (cold reboots are fine).

@ithinuel
Copy link
Collaborator

It appears that I have missed something important around the (seemingly inconsistent) use of BASEPRI & cpsie/d… 🤔

Interrestingly, this isn’t caught by any of the tests I ran. I would like to fix the issue but also eventually add some tests to capture future regressions.
Could you please detail the process you’re following to reproduce the issue?
Do you use an in-tree example/test? How do you load/reload/reset?

@soburi
Copy link
Member

soburi commented Apr 21, 2024

I also encountered a similar phenomenon with arduino_uno_r4_minima (Renesas RA4M1, Cortex-M4/ARMv7-M).
Similar to this issue, I confirmed that reverting f11027d works correctly.
It is also working fine with @duda-patryk patch.

@ithinuel
Copy link
Collaborator

Thank you for letting me know. I am currently running tests on a nucleo_f401re but can’t seem to be able to reproduce the issue.

Along with the fix, I’d like to understand & document the why is this extra cpsie required.

@duda-patryk
Copy link
Collaborator Author

Have you tried jumping to reset vector with interrupts disabled?

@ithinuel
Copy link
Collaborator

I am running Zephyr’s tests & samples.
I get some false positives (ie failing test because of twister missing the start of the log) but they match before & after the changes to arch/arm.

Would you be able to provide a sample project that exhibits this issue?

@duda-patryk
Copy link
Collaborator Author

In my case, I'm jumping to Zephyr reset vector from another code (chain-loading) with interrupts disabled (cpsid i).

To reproduce issue without chain-loading you can use following code:

void z_arm_reset(void);

void test(void)
{
        __disable_irq();
        z_arm_reset();
}

@ithinuel
Copy link
Collaborator

So I edited samples/hello_world/src/main.c such that it becomes:

#include "zephyr/kernel.h"
#include <stdio.h>

extern void z_arm_reset(void);

int main(void) {
    printf("Hello World! %s\n", CONFIG_BOARD_TARGET);

    k_sleep(Z_TIMEOUT_MS(2000));

    __disable_irq();
    z_arm_reset();

    /* should not be reached */
    printf("Reboot failed\n");
    k_sleep(Z_FOREVER);

    return 0;
}

And this indeeds behaves incorrectly as the sleep is not stopping the thread for the requested amount of time.

However, z_arm_reset is not a proper way to reset the system, sys_reboot(_) from zephyr/sys/reboot.h is. Calling sys_reboot(SYS_REBOOT_WARM) or sys_reboot(SYS_REBOOT_COLD) does work as expected.

With reguards to chain loading, I also tried MCUBoot with the helloworld (using sys_reboot(SYS_REBOOT_WARM) and sys_reboot(SYS_REBOOT_COLD)) and it all seems to work as expected too.

For the record, here’s how I built & flashed them:

west build --sysbuild -p -b nucleo_f401re samples/hello_world -DSB_BOOTLOADER=MCUBOOT=y
pyocd load build/mcuboot/zephyr/zephyr.elf
pyocd load build/hello_world/zephyr/zephyr.signed.hex

It might be worth noting that the documentation for BOOTLOADER_MCUBOOT states that:

 By default, this option instructs Zephyr to initialize the core architecture HW registers during boot, when this is supported by the application. This removes the need by MCUboot to reset the core registers' state itself.

Could this be something useful for your usecase?

@duda-patryk
Copy link
Collaborator Author

MCUBoot works because they use irq_lock() to disable interrupts (faults are enabled) https://github.com/mcu-tools/mcuboot/blob/d2e69bf720ec4e5825f1bd7be441a9e3dedb5533/boot/zephyr/main.c#L205

@yperess do you remember why we disable interrupts using cpsid i?

@ithinuel
Copy link
Collaborator

CONFIG_MCUBOOT_CLEANUP_ARM_CORE defaults to y.

When the cleanup is enabled, do_boot() calls cleanup_arm_nvic() which disables interrupts too 🤔.

@duda-patryk
Copy link
Collaborator Author

When the cleanup is enabled, do_boot() calls cleanup_arm_nvic() which disables interrupts too

__disable_irq() is called indeed. @ithinuel I assume that it works for you 😅

@ithinuel
Copy link
Collaborator

Well, I cannot reproduce the issue but from the bug report what's missing is enabling the IRQ so if anything it only confuses me more.

@ithinuel
Copy link
Collaborator

Are you happy closing this issue with the fix that was merged?

@aescolar
Copy link
Member

Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

No branches or pull requests

6 participants