How to debug / backtrace the call stack?

abhisit · 3 Jul 2019, 03:17

Hi All,

I got the Segmentation Fault on below, I would like to get the debug information. To use the GDB is not possible, the command does not exist and much slower, because of it a low resource CPU, it needs to use remote debug, but the remote debug does not convenient with auto test tools (auto power on/off and send the control commands to the unit).

[   49.421091] App[3000]: unhandled level 0 translation fault (11) at 0x98968000989710, esr 0x92000004
[   49.421098] pgd = ffffffc039eff000
[   49.421101] [98968000989710] *pgd=000000003af06003
[   49.421103] , *pud=000000003af06003
[   49.421105] , *pmd=0000000000000000
[   49.421107]
[   49.421108]
[   49.421114] CPU: 2 PID: 3000 Comm: App Not tainted 4.9.0-yyyyyy #6
[   49.421116] Hardware name: xxxx,zzzzz (DT)
[   49.421119] task: ffffffc055b89380 task.stack: ffffffc03acd8000
[   49.421124] PC is at 0x7f7d5d9050
[   49.421126] LR is at 0x7f7d6006e8
[   49.421128] pc : [<0000007f7d5d9050>] lr : [<0000007f7d6006e8>] pstate: 60000000
[   49.421130] sp : 0000007ff0c62720
[   49.421132] x29: 0000007ff0c62720 x28: 0000007f7d25ef18
[   49.421136] x27: 0000000000000000 x26: 000000000000000d
[   49.421141] x25: 00000000147a6b00 x24: 0000000000000090
[   49.421145] x23: 0000000014704360 x22: 0000000000000048
[   49.421149] x21: 0000000014704260 x20: 00000000147a7e70
[   49.421153] x19: 0098968000989680 x18: 000000000000016e
[   49.421157] x17: 0000007f7d5d9040 x16: 0000007f7d807098
[   49.421161] x15: 0000000000000000 x14: 0000000000000391
[   49.421165] x13: ffffffffffff0000 x12: 0000000000000000
[   49.421169] x11: 0000000000000010 x10: 00000000145dfb40
[   49.421173] x9 : 0000007f7ca0e508 x8 : 00000000145dfb60
[   49.421177] x7 : 0000000000000000 x6 : 0000000000000000
[   49.421181] x5 : 0000007f7ca0c9b8 x4 : 00000000ffffffff
[   49.421185] x3 : 00000000145dfb60 x2 : 0000000000000048
[   49.421189] x1 : 0000000000000000 x0 : 00000000147a6b00

I plan to use backtrace() and backtrace_symbols() from "execinfo.h".

Can I use try...catch() like below code?

Will I see the Segmentation Fault on somewhere else on others plugin? Why a lot of people override the notify method and try...catch() over there?

int main(int argc, char *argv[]) try 
{
    QCoreApplication a(argc, argv);
    ...
    ... // Load some plugins here.
    ...
    return a.exec();
}
catch (...)
{
    dumptrace(); // This function is used to dump the call stack by backtrace() and backtrace_symbols().
}

Thank you,
Abhisit.

jsulm · 21 May 2019, 10:53

@abhisit said in How to debug / backtrace the call stack?:

Can I use try...catch() like below code?

No, there is not even a try block in that code and you can't put main() call into a try block as it is called by the loader of the OS, not by you.
You can't use exceptions to catch segmentation faults. You need to run a debug build of your app to see where exactly it crashes.

LiaoXuewei · 20 May 2019, 07:00

I'm also looking for a solution to this problem. There seems to be no good solution, especially for cross-platform problems.

If you've used Java, we know it's easy to get the call stack, but C++ is difficult.

aha_1980 · 21 May 2019, 08:01

@LiaoXuewei the typical solution is running gdbserver on the board and gdb connecting to it on the development machine.

Regards

LiaoXuewei · wrote on 21 May 2019, 08:01

@aha_1980 I just want to get the call stack on production environment. By calling the stack, most of the problems can be easily solved.
I feel that this solution is too difficult to implement, because I have tens of thousands of users, a problem will bring a great price, is it so important that C++ does not have any simple and reliable solution?

JonB · 21 May 2019, 11:14

@jsulm

You can't use exceptions to catch segmentation faults. You need to run a debug build of your app to see where exactly it crashes.

Given that @LiaoXuewei seems to be under gdb and Linux(?). Can't he try signal-catching SIGSEGV and then use GDB backtrace stuff to still access the stack? I know he must not then continue, but it might be enough just to see the trace? Suggest reading, say, https://stackoverflow.com/questions/2350489/how-to-catch-segmentation-fault-in-linux for ideas? May need to Google around, this is not really a Qt question.

jsulm · 21 May 2019, 11:41

@JonB @LiaoXuewei Actually for that use case core dumps are used. You configure your machine in a way it automatically creates a core dump when a process crashes (see https://wiki.archlinux.org/index.php/Core_dump). Then you need to get the core dump from your customer and open it with GDB:

gdb EXECUTABLE CORE_DUMP

Then you can analyse the crash. Executable must be same version as used by the customer.
If you're using release build of the libraries on customer machines (which you should do) you need to load debug symbols in GDB as well to get meaningful stack trace from the core dump (see GDB documentation how to do so).
To create separate files containing debug symbols see https://stackoverflow.com/questions/866721/how-to-generate-gcc-debug-symbol-outside-the-build-target

JonB · 21 May 2019, 12:01

@jsulm
This is good only if end user is comfortable/able to produce core dumps, and supply them to you the developer. (For example, Linux system may [well] be set up not to produce core dumps [e.g. that's the default under Ubuntu]. And some sites may not be prepared to supply you with a core dump for confidentiality reasons.)

So this might be OK for OP, but if not that's why I suggested a possible way forward strictly within the code.

jsulm · 21 May 2019, 12:23

@JonB As far as I know what you suggested is not possible in a portable way, see last post in the link you posted.

JonB · J jsulm 21 May 2019, 12:01

@jsulm
Yep, absolutely, there is much to read/fiddle with, and may not be portable. May or may not work for OP's particular case. Anyway, whether this or your core dump suggestion (OP should look up how his target Linux OS handles core dumps allowed or not), he has a few approaches to consider now....