Upcoming maintenance
Dear Customers and Partners.
This website will be undergoing scheduled maintenance on June 14, 2023. Please be aware there may be disruption to the developer portal website and associated services during the scheduled maintenance period.
This upgrade is essential to ensure the continued performance, reliability, and security of Developer World.
We apologize for any inconvenience.
Deadlock issue in GNSS when SMP is on
-
I mentioned this issue initially in Shared Memory tiles with activated SMP, but the topic is different. So this is a dedicated topic now.
I am facing a deadlock when I have SMP on and try to use GNSS.
The last printf I can see is inside cxd56_gnss_default_sighandler.static void cxd56_gnss_default_sighandler(uint32_t data, FAR void *userdata) { FAR struct cxd56_gnss_dev_s *priv = (FAR struct cxd56_gnss_dev_s *)userdata; int i; int ret; int dtype = CXD56_CPU1_GET_DATA(data); printf("cxd56_gnss_default_sighandler %d %d\n", getpid(), dtype); fflush(stdout); switch (dtype)
I know that the program cannot print all printfs directly before entering deadlock. They are cut off, sometimes more, sometimes less. I tried to move on with a debugger and breakpoints.
The sequence is- CXD56_GNSS_NOTIFY_TYPE_REQBKUPDAT
- I deleted the backup to make sure it is not failing due to corrupted backup data. It was about 10KB of data before. I also tried to use sd0 as a path to prevent using farapi to get the data.
- CXD56_GNSS_NOTIFY_TYPE_BOOTCOMP
- The last step I have seen is going into nxsem_post(&priv->syncsem); There is no return in cxd56_gnss_open after cxd56_gnss_wait_notify.
- I tried a hack to remove this sem and just do a 5 seconds sleep which is the timeout. Then I get stuck in the case CXD56_GNSS_NOTIFY_TYPE_REQCEPDAT
I tried to pin all tasks and threads I found like gnss_receiver to CPU0. So even SMP is on, all should be pinned to CPU0.
I know that with SMP the behavior of locking changes, like in enter_critical_section(). My assumption is that this change of locking behavior causes the locks. I can see that at least these files are related to GNSS cxd56_cpu1signal.c, cxd56_gnss.c, cxd56_cpufifo.c cxd56_farapi.c and cxd56_icc.c.
I also tried this on another Spresense board to exclude a hardware issue as I soldered an external GNSS antenna onto the board.
I will prepare the code I am using to reproduce. I will strip down everything not needed. No additional hardware will be required. It will take a few days.
As there is no documentation on the inter CPU communication and the protocol to communicate with the GNSS MCU and GNSS being closed source, I would require help for further analysis.
- CXD56_GNSS_NOTIFY_TYPE_REQBKUPDAT
-
More comment
- I configured CONFIG_STACK_USAGE_SAFE_PERCENT=95. I told me that gnss_receiver went to 100% (It said 1000 of 1000? CONFIG_CXD56CPU1_WORKER_STACKSIZE should be 1024 ... but there is an alignment modification inside up_create_stack?) -> I changed it to 8000 and the issue went away.
- I applied my fix described here farapi can get into deadlock when SMP is on.
-
@jens6151-0-1-1
GNSS measurement is running on another CPU. And application CPU send some commands to GNSS CPU and wait for it's response. But GNSS core reply it to fixed CPU as MainCore. So if app CPU run to send a command to GNSS CPU on not MainCore, GNSS response can not reach to the CPU.
So it looks like deadlock.
So to fix it use taskset to fix the task using GNSS on the MainCPU.
-
Findings while stripping down the application and getting feedback from the "10n2 - Smart Student Driver Assistant" creators about pinning GNSS to a dedicated core.
- Setting the GNSS time with CXD56_GNSS_IOCTL_SET_TIME will freeze the system (crash the GNSS?) -> leave it out
- Pinning the gnss_receiver thread and the thread that reads the GNSS to the same CPU will freeze the system -> pin to dedicated CPUs.
- Using signals will freeze the system. -> Use polling with fds.
-
@CamilaSouza The system takes care to switch the task to Main core when doing the communication. See arch/arm/src/cxd56xx/cxd56_farapi.c line 192++. Even if I pin the task to a core, it will be overwritten and changed back after the communication completed.
-
The source code is now available at https://github.com/jens6151/bicycleComputer-on-spresense.
The project description is at https://www.hackster.io/jens6151/bicycle-computer-on-spresense-b0e332.Overall
- Setting the GNSS time with CXD56_GNSS_IOCTL_SET_TIME will freeze the system (crash the GNSS?) -> leave it out
- Pinning the gnss_receiver thread and the thread that reads the GNSS to the same CPU will freeze the system -> pin to dedicated CPUs.
- gnss_receiver should be a dedicated core, might be shared .. it depends.
- Using signals will freeze the system. -> Use polling with fds.
- Do not pin anything to core 0.
- Make also sure that any thread used in libraries is pinned, so that it does not accidently block core 0. It might work, but not always.
Btw there is now a release 3.0.0 of the Spresense SDK. Might be that some issues got fixed .
https://developer.sony.com/develop/spresense/docs/release_sdk_en.html -
Hey @jens6151-0-1-1
I'm so glad to see you've posted your project on GitHub!
It is a great source of knowledge for other Spresense developers.Also, thank you for the mention as Spresense Support
I was very honored.When I have some extra time in my hands I plan on going through your project in more detail.
Also, you've been a great resource on our forum. Thanks for your contribution!