Random application crash on Apple Silicon M1 (Qt 6.2.3)
-
Are both the Qt builds you used x86_64 based or was the one from brew ARM based ?
-
Both builds are ARM, I cannot build my app for x86_64 on my Mac M1 because I do not have libs for x86_64.
-
If your dependencies can be satisfied by brew, you can still install them for x86_64. It requires two brew install though.
-
Yes, they can be satisfied by brew but I am not sure I want to do it because I do not have Intel Mac anymore so I cannot test it on Intel machine. I am thinking about installing virtual ARM linux a try there to see if it is ARM related or MacOS related.
-
I was just thinking about testing your application with Rosetta2 to see if it behaved in the same fashion.
-
@KejPi Are you using multi-threading in your app? At WWDC 2020 „Port your mac app to Apple Silicon“ there was a slide reminding developers that
- Intel CPUs and Apple Silicon have a different memory ordering model
- A data race on Intel might have appeared to be benign, but can be causing crashes on Apple Silicon
- Rosetta provides Intel CPU memory ordering
If your app works correct using Rosetta than I would guess you have one of the above mentioned problems.
You might as well have a problem with memory alignment, maybe this thread can give additional insight.
-
That is an interesting point. Actually I use several threads - input rtl-sdr driver initiates thread, backed DAB SDR library runs in normal Posix thread and that there are some GUI classes running in QThreads. In total it is about 14 threads. The threads that I create explicitly are started when the application starts, I am not sure about other threads that I do not control directly, but the application crashes after some time of running. Nevertheless, it seems your hint with x86_64 build seems to be more and more reasonable so I will try to build the libs and the application for x86_64 and see what is going to happen.
-
-
@DerReisende The discussion thread you have posted touches atomic access. My code actually relies on this functionality to share few simple control variables between threads to avoid using mutexes that are much heavier. Could this be an issue?
-
@KejPi said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
there are some GUI classes running in QThreads
mhm, is that something you have done?
Because GUI elements/classes are not allowed to run in a different thread from the one where QCoreApplication lives in!
-
@KejPi said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
@DerReisende The discussion thread you have posted touches atomic access. My code actually relies on this functionality to share few simple control variables between threads to avoid using mutexes that are much heavier. Could this be an issue?
It also talks about memory alignment of structs etc. where you should memcpy received data instead of e.g. just casting to a struct which may cause problems. But I am not really familiar with MT-programming, I just read those things while watching WWDC 2020 and searching for additional info. But IMHO data exchange with a std::atomic or equivalent should be fine. I suggested compilation with USBSAN only because maybe the compiler will warn you about unintentional undefined behaviour that may exist in your code - and might be the cause of your problems. And clang provides these checks OOTB therefore it might be worth a try.
I have so far not used Qt with MT therefore I would check @J-Hilk suggestions as well.
-
@DerReisende said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
where you should memcpy received data instead of e.g. just casting to a struct which may cause problems.
woa, we're talking c++ here right? casting memory to a stuct is undefined behaviour, among other things because it circumvents the constructor! You get away with it in C but not in C++, thats way dynamic_cast exists
-
@J-Hilk said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
@KejPi said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
there are some GUI classes running in QThreads
mhm, is that something you have done?
Because GUI elements/classes are not allowed to run in a different thread from the one where QCoreApplication lives in!
I have not written it correctly, I did not want to go into details. My application consists of 2 parts - backend doing DAB demodulation and decoding, this is written in C and then the HMI part (frontend) that is written in C++ (Qt) and this controlling the backend, the input sources, doing audio and data decoding and playback, etc. And in the HMI part I have developed several classes and some of them run is separate thread but not those that have GUI elements, only some decodes, input devices, etc.
-
@J-Hilk said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
woa, we're talking c++ here right? casting memory to a stuct is undefined behaviour, among other things because it circumvents the constructor! You get away with it in C but not in C++, thats way dynamic_cast exists
I do not do this :-) Trying to do it C++ way where it is possible but I have to admit I am trying to optimize all real-time code as much as possible - like using std::atomic for instead of mutex, etc.
-
@J-Hilk said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
@DerReisende said in Random application crash on Apple Silicon M1 (Qt 6.2.3):
where you should memcpy received data instead of e.g. just casting to a struct which may cause problems.
woa, we're talking c++ here right? casting memory to a stuct is undefined behaviour, among other things because it circumvents the constructor! You get away with it in C but not in C++, thats way dynamic_cast exists
I have plenty of old code from former c programmers who used reinterpret_cast et al to convince the compiler to do what they wanted in really creative ways (created before y2k).
But anyways it was just (a maybe bad) example that the linked thread did not only talk about atomic ops.
If the OP doesnt do it - good. But still a compile run with the undefined behaviour sanitizer may find some problems that may fix the intermittent crashes of the app. -
I have compiled my app with undefined behaviour sanitizer. So far I get a lot of messages like this (not sure how to get rid of them) - they are generated for every Q_OBJECT class I have created:
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/kejpi/Devel/dab/build-gui-Qt_6_2_3-Debug/AbracaDABra_autogen/EWIEGA46WW/moc_audiodecoder.cpp:172:28 in /Users/kejpi/Devel/dab/build-gui-Qt_6_2_3-Debug/AbracaDABra_autogen/EWIEGA46WW/moc_slideshowapp.cpp:145:28: runtime error: member access within address 0x6000025a1d00 which does not point to an object of type 'QObjectData' 0x6000025a1d00: note: object is of type 'QObjectPrivate' 00 00 00 00 90 0d 18 08 01 00 00 00 c0 ef eb 01 00 60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'QObjectPrivate'
And then I was able to find 2 minor issues in my code (shifting of negative number and undefined enum value that was not used anyway). I have also found 1 misalignment that may cause problems. I have fixed all 3 issues and still the application is crashing with message like this:
UndefinedBehaviorSanitizer:DEADLYSIGNAL ==3606==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x0001be9aaf8c sp 0x00016da86520 T42532) ==3606==Hint: pc points to the zero page. ==3606==The signal is caused by a UNKNOWN memory access. ==3606==Hint: address points to the zero page. #0 0x0 (<unknown module>) ==3606==Register values: x[0] = 0x00000001beba354f x[1] = 0x0000000000000000 x[2] = 0x00000001bea02d4c x[3] = 0x00000001bea02da4 x[4] = 0x00000001548a9118 x[5] = 0x00000001548a9128 x[6] = 0x00000001548a9210 x[7] = 0x0000600000b8e800 x[8] = 0x0000000000000003 x[9] = 0x0000000000000003 x[10] = 0x000000000000001d x[11] = 0x0000000000000000 x[12] = 0x0000000000000005 x[13] = 0x0000000155087330 x[14] = 0x00000001beba7028 x[15] = 0x000000020e105458 x[16] = 0x0000000000000000 x[17] = 0x000000020fdb2b90 x[18] = 0x0000000000000000 x[19] = 0x000000020fdaacb0 x[20] = 0x00000001548a9128 x[21] = 0x00000001548a9210 x[22] = 0x0000600001efe240 x[23] = 0x0000600000b8eae0 x[24] = 0x000060000378d600 x[25] = 0x0000600000b8e840 x[26] = 0x0000600002fd1200 x[27] = 0x0000600002fd0fc0 x[28] = 0x0000600003297020 fp = 0x000000016da868b0 lr = 0x00000001be9aaf8c sp = 0x000000016da86520 UndefinedBehaviorSanitizer can not provide additional info. SUMMARY: UndefinedBehaviorSanitizer: SEGV (<unknown module>) ==3606==ABORTING
My reading is that it accessing of NULL pointer somewhere.
-
Maybe
-fsanitize=thread
for thread sanitizer is worth a try as well as it is supposed to find data races (Docs).
Otherwise I am running out of ideas. -
@KejPi Hmm, I found this document which may be useful from Apple.
As far as I understand ARM stores the return address in the link register. Therefore if I understood correctly the
lr
register of your stacktrace should point to the code where the fault was triggered. Maybe this applies to Apple Silicon as well… -
I have tried to run it with thread sanitizer and there is quite a log of data races reported, mostly related to the way I share data from backend (C library) and HMI/Application (Qt). It makes me think that maybe I have some systematic issue :-( This is what I have:
- Backend is C library that runs in Posix thread (T1)
- Data from backend is passed using callback function
- There is dedicated class called RadioControl that is running in QThread (T2). This class does all the communication with backend and communicates with other classes by signals
- Callback functions are declared as friend to RadioControl (static function should work as well IMO)
- I need to take the data in callback function (that is running in T1) and pass them to RadioControl thread T2. I am doing it using signal and Qt::QueuedConnection inside RadioControl class.
Implementation extract:
radiocontrol.h
class RadioControl : public QObject { Q_OBJECT public: bool init(); // ... signals: void dabEvent(RadioControlEvent * pEvent); // ... private: void eventFromDab(RadioControlEvent * pEvent); void emit_dabEvent(RadioControlEvent * pEvent) { emit dabEvent(pEvent); } friend void dabNotificationCb(dabProcNotificationCBData_t * p, void * ctx); };
radiocontrol.cpp
bool RadioControl::init() { connect(this, &RadioControl::dabEvent, this, &RadioControl::eventFromDab, Qt::QueuedConnection); // passing this in last argument as context (ctx) dabProcRegisterNotificationCb(dabProcHandle, dabNotificationCb, (void *) this); //... } void RadioControl::eventFromDab(RadioControlEvent * pEvent) { switch (pEvent->type) // <=== sanitizer complains that this is the data race { case RadioControlEventType::SYNC_STATUS: { } break; //... } void dabNotificationCb(dabProcNotificationCBData_t * p, void * ctx) { RadioControl * radioCtrl = static_cast<RadioControl *>(ctx); switch (p->nid) { case DABPROC_NID_SYNC_STATUS: { RadioControlEvent * pEvent = new RadioControlEvent; const dabProc_NID_SYNC_STATUS_t * pInfo = static_cast<const dabProc_NID_SYNC_STATUS_t *>(p->pData); pEvent->type = RadioControlEventType::SYNC_STATUS; // <=== sanitizer complains that this is the data race pEvent->status = p->status; pEvent->pData = static_cast<intptr_t>(pInfo->syncLevel); radioCtrl->emit_dabEvent(pEvent); } break; // ... }
Do you see any principal issue with this approach?
-
void RadioControl::eventFromDab(RadioControlEvent * pEvent)
{
switch (pEvent->type) // <=== sanitizer complains that this is the data racethis one is, you're passing a pointer to an object around, that lives on an other thread. The QueuedConnecion will only copy the pointer not the underlying data.
18/66