Debugging failing apps on the Mac

Peter van der Linden, last update Mar 10 2003

Core dumps are put in /cores directory. You may find cores from system images in /private/cores.

But for a user application, the OS will try to extract a more user-friendly ASCII log which goes in ~/Library/Logs/CrashReporter. Here's a crash report from a recenty Safari crash I got:

It has about 20 threads, all of which have a stack traceback. This one crashed because it tried to dereference a null pointer. Note, some of the threads look like they are Java threads.

Other Approaches

You can also turn on crash log reporting in the console.

That puts a limit on the max size of a core dump. 64240 is a resonable figure (64MB). Do "man limit" for more information. A core dump limit of zero corresponds to no core file. For most purposes, the crash report is a lot easier to use than a core dump.

Why is something crashing?

First of all, look at the ~/Library/Logs/CrashReporter for that user. See if there is a report there. What does it say?
Next, turn on core dumping, and wait for the app to generate a core. Use gdb to examine it. Why did the process core? To turn on cores globally, put COREDUMPS=-YES- in /etc/hostconfig. This is probably what you want, to get a core from an arbitrary application.

Next, what are the process limits? The default limits are quite low. Is the process failing because it is hitting a limit? A Java leak will eventually have this result. Increase all the limits marginally, and try again. You don't want to set the limits so high that the app will take 2 months to run into them. You just want to see if you can hit the same limit or some other one or some other problem entirely.
The "datasize" is the maximum growth of the data+stack region via sbrk(2) beyond the end of the program text. The stacksize is the maximum size of the stack region.

What do the log files show?

Look at the console log for error messages. The console is a utility application. Look at the log files. These are in /var/log/* . The most informative general file is /var/log/system.log which is an ASCII file containing entries like:

The system log has way more information than the console, essentially logging all interesting events and errors. Older system.log files are compressed, numbered, and stored in the same directory.

Using gdb to analyze a core file

http://howto.apple.com/db.cgi?Gdb

Hung Apps

Run "sample 3" to get a stack trace of what it's doing. E.g.

   % sample Finder 5
will produce output like this, showing what the Finder was doing every 10 millisecs, over the last 500 times you checked:
Analysis of sampling pid 412 every 10 milliseconds
Call graph:
    500 Thread_1103
      500 0x27300
        500 0x27480
          500 0x28400
            500 0x93e0
              500 0x4970
                500 WaitNextEvent
                  500 WNEInternal
                    500 GetNextEventMatchingMask
                      500 RunCurrentEventLoopInMode
                        500 CFRunLoopRunSpecific
                          500 __CFRunLoopRun
                            500 mach_msg
                              500 mach_msg_trap
                                500 mach_msg_trap [STACK TOP]
    500 Thread_1203
      500 _pthread_body
        500 _ZN7LThread11_RunWrapperEPv
          500 _ZN19NSLRequestMgrThread3RunEv
            500 CFRunLoopRun
              500 CFRunLoopRunSpecific
                500 __CFRunLoopRun
                  500 mach_msg
                    500 mach_msg_trap
                      500 mach_msg_trap [STACK TOP]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
        mach_msg_trap [STACK TOP]        1000

There is a gui sampler.app tool, and it also puts the output in an ASCII file in /tmp.
Run "ktrace" to see what syscalls it's making.

Kernel Panic

A step beyond a simple application crash is an actual OS crash. A kernel panic is one of the worst kind of bugs anyone can encounter. In the Windows world, these are frequent enough to have a joke name: "the blue screen of death". The OS has encountered an unexpected invalid situation, and the only thing it can do is stop immediately before more data disappears. You lose all your applications, all unsaved data, risk corrupting the filesystem, and have to start again after a reboot.

At this date I am running MacOS x 10.2.3, and I got kernel panics on Nov 25 2002 (twice), Nov 27, Feb 7 2003. I upgraded to 10.2.4, and on March 10 2003, got another kernel panic. On MacOS, from Jaguar (10.2) on, you see this screen when the kernel panics:

We need to report all instances of this bug, otherwise the OS group can't fix them. And then Apple products would be no better than Windows.

  • On your system, edit file /Library/Logs/panic.log, and find the 20-30 lines that start with the date of this panic. Copy those lines into the bug report

    If you want to get more ambitious and do some kernel debugging on your own, or set things up so OS programmers can remotely debug the issue on your system, then you set the "debug" parameter in nvram. This is done from the commandline.

    You can find out more about this and about remote debugging from: http://developer.apple.com/techpubs/macosx/Darwin/howto/kext_tutorials/hello_debugger/hello_debugger.html