Archive for October, 2015

android: child process hits mutex deadlock in printf after fork

October 31, 2015

This post is to demonstrate an example in which the child process hits mutex deadlock in printf after fork. This case is a special case of android: child process hits mutex deadlock after fork.

example program
android: example: child process hits mutex deadlock in printf after fork.

software of testing device
LA.BF64.1.1-06510-8×94.0 with Android 5.0.0_r2(LRX21M) and Linux kernel 3.10.49.

hardware of testing device
Architecture arm-v8 cortex-53.

log of running this program

$ adb shell /data/printf-deadlock-after-fork | grep -v -E "(busy dump)"
26153:26153, printf-deadlock-after-fork starts
26153:26155, void* thread_routine(void*) is ready to call printf in busy loop
26153:26153, parent process after fork

code flow

  • process 26153’s main thread 26153:26153 starts
  • thread 26153:26153 creates thread 26153:26155
  • thread 26153:26155 runs in thread_routine and calls printf in busy loop
  • thread 26153:26153 forks child process 26158
  • child process 26158 call printf() and hits deadlock while get file lock of stdout
  • call stacks of blocked child process at timestamp
    The pid of child process is 26158. The state of child process is sleep.

    $ adb shell ps -t | grep printf
    root      26153 618   3584   828   ffffffff 9823b3ec S /data/printf-deadlock-after-fork
    root      26155 26153 3584   828   00000000 9823b8fc R printf-deadlock
    root      26158 26153 3584   260   00277680 981f2780 S /data/printf-deadlock-after-fork
    
    $ adb shell debuggerd64 -b 26158
    ----- pid 26158 at 2015-10-31 16:16:06 -----
    Cmd line: /data/printf-deadlock-after-fork
    ABI: 'arm64'
    
    "printf-deadlock" sysTid=26158
      #00 pc 000000000001377c  /system/lib64/libc.so (syscall+28)
      #01 pc 0000000000019cd4  /system/lib64/libc.so
    (pthread_mutex_lock+148)
      #02 pc 0000000000053918  /system/lib64/libc.so (vfprintf+24)
      #03 pc 000000000004f30c  /system/lib64/libc.so (printf+144)
      #04 pc 0000000000000d9c  /data/printf-deadlock-after-fork (main+572)
      #05 pc 0000000000013474  /system/lib64/libc.so (__libc_init+100)
      #06 pc 0000000000000e10  /data/printf-deadlock-after-fork
    
    ----- end 26158 -----
    

    The child process is blocked at FLOCKFILE(fp) of vfprintf. At this place does child process wait for the mutex of stdin.

    $ aarch64-linux-android-addr2line -f -e symbols/system/lib64/libc.so -a 0x0000000000053918                                                                 
    0x0000000000053918                                                                                                                                                                                        
    vfprintf                                                                                                                                                                                                  
    bionic/libc/upstream-openbsd/lib/libc/stdio/vfprintf.c:266
    
    int
    vfprintf(FILE *fp, const char *fmt0, __va_list ap)
    {
    	int ret;
    
    	FLOCKFILE(fp);
    	ret = __vfprintf(fp, fmt0, ap);
    	FUNLOCKFILE(fp);
    	return (ret);
    }
    
    #define FLOCKFILE(fp)   flockfile(fp)
    #define FUNLOCKFILE(fp) funlockfile(fp)
    
    void flockfile(FILE* fp) {
      if (fp != NULL) {
        pthread_mutex_lock(&_FLOCK(fp));
      }
    }
    
    struct __sfileext {
    	struct	__sbuf _ub; /* ungetc buffer */
    	struct wchar_io_data _wcio;	/* wide char io status */
    	pthread_mutex_t _lock; /* file lock */
    };
    
    #define _FILEEXT_INITIALIZER  {{NULL,0},{0},PTHREAD_RECURSIVE_MUTEX_INITIALIZER}
    
    #define _EXT(fp) ((struct __sfileext *)((fp)->_ext._base))
    #define _UB(fp) _EXT(fp)->_ub
    #define _FLOCK(fp)  _EXT(fp)->_lock
    

    analysis: why this deadlock happens
    In android: child process hits mutex deadlock after fork, we conclude that the forked child process should avoid getting mutex before calling exec() or _exit(). Otherwise, it might hit mutex deadlock.

    In this case, one thread of the parent process is in busy loop to printf. If the thread is locking the mutex of stdin at fork, then the child process would hit stdin file mutex deadlock in printf.

    how to fix this deadlock
    This patch fix child process hits mutex deadlock in printf after fork replaces printf() with write() to fix this problem. Unlike printf(), write() is a system call and it directly requests kernel to copy message to stdout file.

    
    diff --git a/PrintfDeadlockAfterFork.cpp b/PrintfDeadlockAfterFork.cpp
    index bdd1df2..12a6a27 100644
    --- a/PrintfDeadlockAfterFork.cpp
    +++ b/PrintfDeadlockAfterFork.cpp
    @@ -56,8 +56,10 @@ int main()
         pid = fork();
         if (pid == 0) {
             void *ptr;
    -        printf("%d:%d, child process after fork\n", getpid(), gettid());
    -        printf("%d:%d, is ready to _exit(1)\n", getpid(), gettid());
    +        char buf[4096];
    +        sprintf(buf, "%d:%d, child process after fork\n", getpid(), gettid());
    +        sprintf(buf, "%d:%d, is ready to _exit(1)\n", getpid(), gettid());
    +        write(1, buf, strlen(buf));
             _exit(1);
         }
    
    

    conclusion
    The forked child process should use write() rather than printf() before calling exec() or _exit(). Otherwise, it might hit mutex deadlock in printf().

    android: child process hits mutex deadlock in exit after fork

    October 30, 2015

    This post is to demonstrate an example in which the child process hits mutex deadlock in exit after fork. This case is a special case of android: child process hits mutex deadlock after fork.

    example program
    android: example: child process hits mutex deadlock in exit after fork

    software of testing device
    LA.BF64.1.1-06510-8×94.0 with Android 5.0.0_r2(LRX21M) and Linux kernel 3.10.49.

    hardware of testing device
    Architecture arm-v8 cortex-53.

    log of running this program

    12663:12663, exit-deadlock-after-fork starts
    12663:12665, void* thread_routine(void*) calls mObj.lock()
    12663:12663, parent process after fork
    12666:12663, child process after fork
    12666:12663, is ready to exit(1)
    

    code flow

  • process 12663’s main thread 12623:12623 starts
  • thread 12663:12663 creates thread 12663:12665
  • thread 12663:12665 runs in thread_routine and locks mObj.mutex
  • thread 12663:12663 forks child process 12666
  • child process 12666 call exit() and hits deadlock while releasing mObj and requesting mObj.mutex
  • backtrace of child process 12666 while deadlock

    $ adb shell debuggerd64 -b 12666
    ----- pid 12666 at 2015-10-30 14:29:49 -----
    Cmd line: /data/exit-deadlock-after-fork
    ABI: 'arm64'
    
    "exit-deadlock-a" sysTid=12666
      #00 pc 000000000001377c  /system/lib64/libc.so (syscall+28)
      #01 pc 0000000000019cd4  /system/lib64/libc.so (pthread_mutex_lock+148)
      #02 pc 0000000000000f00  /data/exit-deadlock-after-fork
      #03 pc 0000000000059f10  /system/lib64/libc.so (__cxa_finalize+324)
      #04 pc 0000000000012664  /system/lib64/libc.so (exit+20)
      #05 pc 0000000000000db8  /data/exit-deadlock-after-fork (main+584)
      #06 pc 0000000000013474  /system/lib64/libc.so (__libc_init+100)
      #07 pc 0000000000000e54  /data/exit-deadlock-after-fork
    

    analysis: why the child process is blocked at exit()
    Within exit(), the child process is releasing all global objects. Unfortunately, if one thread of parent process is locking a global object’s mutex which is required in the object’s destructor. Then, the child process after fork will hit deadlock if it calls exit which require releasing all global objects.

    $ aarch64-linux-android-addr2line -f -e symbols/system/lib64/libc.so -a 0000000000059f10
    0x0000000000059f10
    __cxa_finalize
    bionic/libc/upstream-openbsd/lib/libc/stdlib/atexit.c:141
    
    /*
     * Call all handlers registered with __cxa_atexit() for the shared
     * object owning 'dso'.
     * Note: if 'dso' is NULL, then all remaining handlers are called.
     */
    void
    __cxa_finalize(void *dso)
    {
    	struct atexit *p, *q;
    	struct atexit_fn fn;
    	int n, pgsize = getpagesize();
    	static int call_depth;
    
    	_ATEXIT_LOCK();
    	call_depth++;
    
    restart:
    	restartloop = 0;
    	for (p = __atexit; p != NULL; p = p->next) {
    		for (n = p->ind; --n >= 0;) {
    			if (p->fns[n].fn_ptr == NULL)
    				continue;	/* already called */
    			if (dso != NULL && dso != p->fns[n].fn_dso)
    				continue;	/* wrong DSO */
    
    			/*
    			 * Mark handler as having been already called to avoid
    			 * dupes and loops, then call the appropriate function.
    			 */
    			fn = p->fns[n];
    			if (mprotect(p, pgsize, PROT_READ | PROT_WRITE) == 0) {
    				p->fns[n].fn_ptr = NULL;
    				mprotect(p, pgsize, PROT_READ);
    			}
    			_ATEXIT_UNLOCK();
    			(*fn.fn_ptr)(fn.fn_arg);
    			_ATEXIT_LOCK();
    			if (restartloop)
    				goto restart;
    		}
    	}
    
    	call_depth--;
    
    	/*
    	 * If called via exit(), unmap the pages since we have now run
    	 * all the handlers.  We defer this until calldepth == 0 so that
    	 * we don't unmap things prematurely if called recursively.
    	 */
    	if (dso == NULL && call_depth == 0) {
    		for (p = __atexit; p != NULL; ) {
    			q = p;
    			p = p->next;
    			munmap(q, pgsize);
    		}
    		__atexit = NULL;
    	}
    	_ATEXIT_UNLOCK();
    }
    

    how to fix this deadlock
    This patch fix child process hits mutex deadlock in exit after fork replaces exit() with _exit() to fix this problem. Unlike exit(), _exit() is a system call and it directly requests kernel to do_exit() the process.

    diff --git a/ExitDeadlockAfterFork.cpp b/ExitDeadlockAfterFork.cpp
    index 98b2915..743f818 100644
    --- a/ExitDeadlockAfterFork.cpp
    +++ b/ExitDeadlockAfterFork.cpp
    @@ -92,8 +92,8 @@ int main()
         pid = fork();
         if (pid == 0) {
             printf("%d:%d, child process after fork\n", getpid(), gettid());
    -        printf("%d:%d, is ready to exit(1)\n", getpid(), gettid());
    -        exit(1);
    +        printf("%d:%d, is ready to _exit(1)\n", getpid(), gettid());
    +        _exit(1);
         }
     
         printf("%d:%d, parent process after fork\n", getpid(), gettid());
    
    

    conclusion
    If a forked child process doesn’t call exec() and wants to terminate itself. Then it should call _exit() rather than exit() to avoid deadlock.

    android: child process hits mutex deadlock after fork

    October 30, 2015

    This post is to demonstrate an example in which the child process hits mutex deadlock after fork. Below two are special cases of this post.

  • android: child process hits mutex deadlock in exit after fork
  • android: child process hits mutex deadlock in printf after fork
  • example program
    android: example: child process hits mutex deadlock after fork.

    software of testing device
    LA.BF64.1.1-06510-8×94.0 with Android 5.0.0_r2(LRX21M) and Linux kernel 3.10.49.

    hardware of testing device
    Architecture arm-v8 cortex-53.

    log of running this program

    8973:8973, mutex-deadlock-after-fork starts
    8973:8975, void* thread_routine(void*)
    8973:8973, parent process after fork
    8976:8973, child process after fork
    8976:8973, child process is ready to get mutex
    8973:8975, void* thread_routine(void*)
    8973:8975, void* thread_routine(void*)
    8973:8975, void* thread_routine(void*)
    8973:8975, void* thread_routine(void*)
    8973:8975, void* thread_routine(void*)
    8973:8975, void* thread_routine(void*)
    

    code flow

  • process 8973’s main thread 8973:8973 starts
  • thread 8973:8973 create thread 8973:8975
  • thread 8973:8975 runs in thread_routine and locks a mutex
  • thread 8973:8973 forks child process 8976
  • child process 8976 tries to get the locked mutex and hits deadlock
  • analysis: why this deadlock happens
    After one thread of multiple thread process calls fork(), the child process will have only one duplicated thread and same address space as parent process. If another thread of parent process is locking a mutex, then the mutex is also locked in child process. But this thread is not duplicated, the mutex will be locked forever in child process. Thus, if the forked child process tries to get the mutex, it will hit deadlock.

    conclusion
    The forked child process should avoid getting mutex before calling exec() or _exit(). Otherwise, it might hit mutex deadlock.

    android: arm64: unwind stack to get frames and backtrace of a thread

    October 29, 2015

    This post is to discuss backtrace, frame, and how frames make up a backtrace in arm64. We demonstrate an example in which we unwind stack to get frames and backtrace of a thread.

    testing environment
    The infrastructure code base here is LA.BF64.1.1-06510-8×94.0 with Android 5.0.0_r2(LRX21M) and Linux kernel 3.10.49. The device CPU is architecture arm-v8 cortex-53.

    get backtrace from tombstone
    In android: arm64: analyze the call stack after a process hit native crash, we shows how to get the backtrace of a process at native crash from tombstone.

    ABI: 'arm64'
    pid: 20948, tid: 20948, name: coredumptest  >>> /data/coredumptest <<< 
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 
    .......
    backtrace:
        #00 pc 0000000000014434  /system/lib64/libc.so (strlen+16)
        #01 pc 0000000000000efc  /data/coredumptest
        #02 pc 0000000000000f84  /data/coredumptest
        #03 pc 000000000000100c  /data/coredumptest
        #04 pc 0000000000001094  /data/coredumptest
        #05 pc 0000000000000d78  /data/coredumptest (main+40)
        #06 pc 0000000000013474  /system/lib64/libc.so (__libc_init+100)
        #07 pc 0000000000000e8c  /data/coredumptest
    

    get the thread’s stack from tombstone
    the sp register points to top of the thread’s stack. It appears that the stack has 8 frames, from frame #00 to frame #07. Each frame except frame #00 has a symbol indicating from which to enter this frame.

    ABI: 'arm64'
    pid: 20948, tid: 20948, name: coredumptest  >>> /data/coredumptest <<< 
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 
    .......
        x28  0000000000000000  x29  0000007fcbe33220  x30  0000005595326f00
        sp   0000007fcbe33220  pc   0000007faa597434  pstate 0000000040000000
    ......
    stack:
             0000007fcbe331a0  0000000000000000  
             0000007fcbe331a8  0000000000000000  
             0000007fcbe331b0  0000000000000000  
             0000007fcbe331b8  0000000000000000  
             0000007fcbe331c0  0000000000000000  
             0000007fcbe331c8  0000000000000000  
             0000007fcbe331d0  0000000000000000  
             0000007fcbe331d8  0000000000000000  
             0000007fcbe331e0  0000000000000000  
             0000007fcbe331e8  0000000000000000  
             0000007fcbe331f0  0000000000000000  
             0000007fcbe331f8  0000007faa63e040  /system/bin/linker64
             0000007fcbe33200  0000000000000006  
             0000007fcbe33208  0000000000000000  
             0000007fcbe33210  0000007fcbe33270  [stack]
             0000007fcbe33218  0000007faa5dcc98  /system/lib64/libc.so (__cxa_atexit+60)
        #00  0000007fcbe33220  0000007fcbe33240  [stack]
             ........  ........
        #01  0000007fcbe33220  0000007fcbe33240  [stack]
             0000007fcbe33228  0000005595326f88  /data/coredumptest
             0000007fcbe33230  ffffffffffffffff  
             0000007fcbe33238  0000007fcbe33348  [stack]
        #02  0000007fcbe33240  0000007fcbe33260  [stack]
             0000007fcbe33248  0000005595327010  /data/coredumptest
             0000007fcbe33250  ffffffffffffffff  
             0000007fcbe33258  0000007faa63e198  /system/bin/linker64
        #03  0000007fcbe33260  0000007fcbe33280  [stack]
             0000007fcbe33268  0000005595327098  /data/coredumptest
             0000007fcbe33270  ffffffffffffffff  
             0000007fcbe33278  0000007faa596468  /system/lib64/libc.so (__libc_init+88)
        #04  0000007fcbe33280  0000007fcbe332a0  [stack]
             0000007fcbe33288  0000005595326d7c  /data/coredumptest (main+44)
             0000007fcbe33290  ffffffffffffffff  
             0000007fcbe33298  0000005595326d50  /data/coredumptest (main)
        #05  0000007fcbe332a0  0000007fcbe332d0  [stack]
             0000007fcbe332a8  0000007faa596478  /system/lib64/libc.so (__libc_init+104)
             0000007fcbe332b0  0000007fcbe33358  [stack]
             0000007fcbe332b8  0000000000000000  
             0000007fcbe332c0  ffffffffffffffff  
             0000007fcbe332c8  ffffffffffffffff  
        #06  0000007fcbe332d0  0000007fcbe33300  [stack]
             0000007fcbe332d8  0000005595326e90  /data/coredumptest
             0000007fcbe332e0  0000000000000000  
             0000007fcbe332e8  0000000000000000  
             0000007fcbe332f0  0000000000000000  
             0000007fcbe332f8  0000000000000000  
        #07  0000007fcbe33300  0000000000000000
             0000007fcbe33308  0000007faa63f610  /system/bin/linker64 (_start+8)
             0000007fcbe33310  0000000000000000
             0000007fcbe33318  0000007fcbe33340  [stack]
             0000007fcbe33320  0000000000000000
             0000007fcbe33328  0000005595338ce0  /data/coredumptest
             0000007fcbe33330  0000005595338cf0  /data/coredumptest
             0000007fcbe33338  0000005595338d00  /data/coredumptest
             0000007fcbe33340  0000000000000001
             0000007fcbe33348  0000007fcbe33a3a  [stack]
             0000007fcbe33350  0000000000000000
             0000007fcbe33358  0000007fcbe33a4d  [stack]
             0000007fcbe33360  0000007fcbe33a62  [stack]
             0000007fcbe33368  0000007fcbe33a8e  [stack]
             0000007fcbe33370  0000007fcbe33aa1  [stack]
             0000007fcbe33378  0000007fcbe33d67  [stack]
    

    what is frame pointer
    According to arm64 aarch64 state PCS(Procedure Call Standard), register x29 is frame pointer and x30 is link register. arm64 kernel enable frame pointer which makes it easy to unwind thread’s stack into frames. All frames are in a linked list in which each element points to the next element with its frame pointer.

    how a frame is constructed and destructed
    While entering a function, a frame might be created such as line 2, 3.

  • stack pointer is decreased by frame size.(push)
  • frame pointer and link register are stored at the top of the frame.
  • frame pointer is updated as stack pointer.
  • While leaving a function, the created frame should be removed such as line 11.

  • frame pointer and link register are loaded from top of the frame.
  • stack pointer is increased by frame size.(pop)
  • 0000000000000ec0 <atexit>:
         ec0:   a9be7bfd    stp x29, x30, [sp,#-32]!
         ec4:   910003fd    mov x29, sp
         ec8:   f9000fa0    str x0, [x29,#24]
         ecc:   90000000    adrp    x0, 0 <writev@plt-0xbc0>
         ed0:   913a6000    add x0, x0, #0xe98
         ed4:   f0000081    adrp    x1, 13000 <__dso_handle>
         ed8:   91000022    add x2, x1, #0x0
         edc:   f9400fa1    ldr x1, [x29,#24]
         ee0:   97ffff74    bl  cb0 <__cxa_atexit@plt>
         ee4:   a8c27bfd    ldp x29, x30, [sp],#32
         ee8:   d65f03c0    ret   
    

    how all frames are linked with frame pointer
    All frames form a linked list. Frame pointer is the head of this linked list.

  • In this example, frame pointer x29 0000007fcbe33220 points to the top address of frame #1.
  • ABI: 'arm64'
    pid: 20948, tid: 20948, name: coredumptest  >>> /data/coredumptest <<< 
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 
    .......
        x28  0000000000000000  x29  0000007fcbe33220  x30  0000005595326f00
        sp   0000007fcbe33220  pc   0000007faa597434  pstate 0000000040000000
    
  • frame pointer, 0x0000007fcbe33220, of frame #1 points to the top address of frame #02, 0000007fcbe33240
  • link register, 0x0000007fcbe33228, of frame #1 points to 0000005595326f88 /data/coredumptest
  •     #01  0000007fcbe33220  0000007fcbe33240  [stack]
             0000007fcbe33228  0000005595326f88  /data/coredumptest
             0000007fcbe33230  ffffffffffffffff  
             0000007fcbe33238  0000007fcbe33348  [stack]
    
  • frame pointer, 0000007fcbe33240, of frame #2 points to the top address of frame #03, 0000007fcbe33260
  • link register, 0000007fcbe33248, of frame #2 points to 0000005595327010 /data/coredumptest
  •     #02  0000007fcbe33240  0000007fcbe33260  [stack]
             0000007fcbe33248  0000005595327010  /data/coredumptest
             0000007fcbe33250  ffffffffffffffff  
             0000007fcbe33258  0000007faa63e198  /system/bin/linker64
    
  • frame pointer, 0000007fcbe33260, of frame #3 points to the top address of frame #04, 0000007fcbe33280
  • link register, 0000007fcbe33268, of frame #3 points to 0000005595327098 /data/coredumptest
  •     #03  0000007fcbe33260  0000007fcbe33280  [stack]
             0000007fcbe33268  0000005595327098  /data/coredumptest
             0000007fcbe33270  ffffffffffffffff  
             0000007fcbe33278  0000007faa596468  /system/lib64/libc.so (__libc_init+88)
    
  • frame pointer, 0000007fcbe33280, of frame #4 points to the top address of frame #05, 0000007fcbe332a0
  •     #04  0000007fcbe33280  0000007fcbe332a0  [stack]
             0000007fcbe33288  0000005595326d7c  /data/coredumptest (main+44)
             0000007fcbe33290  ffffffffffffffff  
             0000007fcbe33298  0000005595326d50  /data/coredumptest (main)
    
  • frame pointer, 0000007fcbe332a0, of frame #5 points to the top address of frame #06, 0000007fcbe332d0
  • link register, 0000007fcbe332a8, of frame #5 points to 0000007faa596478 /system/lib64/libc.so (__libc_init+104)
  •     #05  0000007fcbe332a0  0000007fcbe332d0  [stack]
             0000007fcbe332a8  0000007faa596478  /system/lib64/libc.so (__libc_init+104)
             0000007fcbe332b0  0000007fcbe33358  [stack]
             0000007fcbe332b8  0000000000000000  
             0000007fcbe332c0  ffffffffffffffff  
             0000007fcbe332c8  ffffffffffffffff  
    
  • frame pointer, 0000007fcbe332d0, of frame #6 points to the top address of frame #07, 0000007fcbe33300
  • link register, 0000007fcbe332d8, of frame #6 points to 0000005595326e90 /data/coredumptest
  •     #06  0000007fcbe332d0  0000007fcbe33300  [stack]
             0000007fcbe332d8  0000005595326e90  /data/coredumptest
             0000007fcbe332e0  0000000000000000  
             0000007fcbe332e8  0000000000000000  
             0000007fcbe332f0  0000000000000000  
             0000007fcbe332f8  0000000000000000  
    
  • frame pointer, 0000007fcbe33300, of frame #7 points to 0000000000000000. It’s the last frame.
  • link register, 0000007fcbe33308, of frame #7 points to 0000007faa63f610 /system/bin/linker64 (_start+8)
  •     #07  0000007fcbe33300  0000000000000000
             0000007fcbe33308  0000007faa63f610  /system/bin/linker64 (_start+8)
             0000007fcbe33310  0000000000000000
             0000007fcbe33318  0000007fcbe33340  [stack]
             0000007fcbe33320  0000000000000000
             0000007fcbe33328  0000005595338ce0  /data/coredumptest
             0000007fcbe33330  0000005595338cf0  /data/coredumptest
             0000007fcbe33338  0000005595338d00  /data/coredumptest
             0000007fcbe33340  0000000000000001
             0000007fcbe33348  0000007fcbe33a3a  [stack]
             0000007fcbe33350  0000000000000000
             0000007fcbe33358  0000007fcbe33a4d  [stack]
             0000007fcbe33360  0000007fcbe33a62  [stack]
             0000007fcbe33368  0000007fcbe33a8e  [stack]
             0000007fcbe33370  0000007fcbe33aa1  [stack]
             0000007fcbe33378  0000007fcbe33d67  [stack]
    

    how to get backtrace from linked frames
    The linker register of each frame is stored in the second double word of a frame.

  • #00 pc is current pc register 0000007faa597434
  • #01 pc is current link register 0000005595326f00 – 0x4 = 0000005595326efc
  • #02 pc is link register of frame #01 0000005595326f88 – 0x4 = 0000005595326f84
  • #03 pc is link register of frame #02 0000005595327010 – 0x4 = 000000559532700c
  • #04 pc is link register of frame #03 0000005595327098 – 0x4 = 0000005595327094
  • #05 pc is link register of frame #04 0000005595326d7c – 0x4 = 0000005595326d78
  • #06 pc is link register of frame #05 0000007faa596478 – 0x4 = 0000007faa596474
  • #07 pc is link register of frame #06 0000005595326e90 – 0x4 = 0000005595326e8c
  • The backtrace unwinded from stack is same as the backtrace of tombstone and gdb loading core file in android: arm64: analyze the call stack after a process hit native crash.

    ABI: 'arm64'
    pid: 20948, tid: 20948, name: coredumptest  >>> /data/coredumptest <<< 
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 
    ......
        x28  0000000000000000  x29  0000007fcbe33220  x30  0000005595326f00
        sp   0000007fcbe33220  pc   0000007faa597434  pstate 0000000040000000
    ......
    backtrace:
        #00 pc 0000000000014434  /system/lib64/libc.so (strlen+16)
        #01 pc 0000000000000efc  /data/coredumptest
        #02 pc 0000000000000f84  /data/coredumptest
        #03 pc 000000000000100c  /data/coredumptest
        #04 pc 0000000000001094  /data/coredumptest
        #05 pc 0000000000000d78  /data/coredumptest (main+40)
        #06 pc 0000000000013474  /system/lib64/libc.so (__libc_init+100)
        #07 pc 0000000000000e8c  /data/coredumptest
    
    (gdb) bt
    #0  strlen () at bionic/libc/arch-arm64/generic/bionic/strlen.S:71
    #1  0x0000005595326f00 in strlen (s=0x0) at bionic/libc/include/string.h:239
    #2  test4 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:11
    #3  0x0000005595326f88 in test3 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:20
    #4  0x0000005595327010 in test2 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:29
    #5  0x0000005595327098 in test1 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:38
    #6  0x0000005595326d7c in main () at frameworks/native/services/coredumptest/CoredumpTest.cpp:56
    

    conclusion
    In this post, we explain what is backtrace and frames. We then show how to use registers such stack pointer and frame pointer of current context to get frames and backtrace of a thread.

    android: arm64: how to analyze the call stack after a process hit native crash

    October 28, 2015

    This post is to analyze the call stack after a process hits native crash in android.

    testing environment
    The infrastructure code base here is LA.BF64.1.1-06510-8×94.0 with Android 5.0.0_r2(LRX21M) and Linux kernel 3.10.49. The device CPU is architecture arm-v8 cortex-53.

    use gdb to get call stack from core file
    In android: coredump: analyze core file with gdb, we demonstrate how to use gdb to load core file and get call stack of coredumptest, which deferences a NULL pointer and hit native crash.

    (gdb) bt
    #0  strlen () at bionic/libc/arch-arm64/generic/bionic/strlen.S:71
    #1  0x0000005595326f00 in strlen (s=0x0) at bionic/libc/include/string.h:239
    #2  test4 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:11
    #3  0x0000005595326f88 in test3 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:20
    #4  0x0000005595327010 in test2 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:29
    #5  0x0000005595327098 in test1 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:38
    #6  0x0000005595326d7c in main () at frameworks/native/services/coredumptest/CoredumpTest.cpp:56
    

    get call stack from tombstone
    In addition to core file, we could also get call stacks from tombstone. While a 64-bit process hits native crash, debuggerd64 wlll attach the process and dump its register and call stacks in /data/tombstones/tombstone_0x, where 0 <= x <= 9.

    ABI: 'arm64'
    pid: 20948, tid: 20948, name: coredumptest  >>> /data/coredumptest <<< 
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 
        x0   0000000000000000  x1   0000000000000000  x2   0000007fcbe33358  x3   000000000000000a
        x4   0000000000000001  x5   0000000000000000  x6   000000000000000b  x7   0000000000000000
        x8   00000000000000a4  x9   0000000000000000  x10  0000007fcbe32f88  x11  0101010101010101
        x12  0000000000000001  x13  000000000000001e  x14  0000007faa6560f0  x15  0000007faa656100
        x16  0000005595338fb8  x17  0000007faa597424  x18  0000000000000000  x19  ffffffffffffffff
        x20  0000007fcbe33348  x21  0000000000000001  x22  0000005595326d50  x23  0000000000000000
        x24  0000000000000000  x25  0000000000000000  x26  0000000000000000  x27  0000000000000000
        x28  0000000000000000  x29  0000007fcbe33220  x30  0000005595326f00
        sp   0000007fcbe33220  pc   0000007faa597434  pstate 0000000040000000
        v0   2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e  v1   746165662e6d70642e74736973726570
        v2   6f63656e6362696c0000000000657275  v3   00000000000000000000000000000000
        v4   00000000000000008020080280000000  v5   00000000400000004000000000000000
        v6   00000000000000000000000000000000  v7   80200802802008028020080280200802
        v8   00000000000000000000000000000000  v9   00000000000000000000000000000000
        v10  00000000000000000000000000000000  v11  00000000000000000000000000000000
        v12  00000000000000000000000000000000  v13  00000000000000000000000000000000
        v14  00000000000000000000000000000000  v15  00000000000000000000000000000000
        v16  40100401401004014010040140100401  v17  00000000a00a80000000aa8000404000
        v18  00000000000000008020080280000000  v19  00000000000000000000000000000000
        v20  00000000000000000000000000000000  v21  00000000000000000000000000000000
        v22  00000000000000000000000000000000  v23  00000000000000000000000000000000
        v24  00000000000000000000000000000000  v25  00000000000000000000000000000000
        v26  00000000000000000000000000000000  v27  00000000000000000000000000000000
        v28  00000000000000000000000000000000  v29  00000000000000000000000000000000
        v30  00000000000000000000000000000000  v31  00000000000000000000000000000000
        fpsr 00000000  fpcr 00000000
    
    backtrace:
        #00 pc 0000000000014434  /system/lib64/libc.so (strlen+16)
        #01 pc 0000000000000efc  /data/coredumptest
        #02 pc 0000000000000f84  /data/coredumptest
        #03 pc 000000000000100c  /data/coredumptest
        #04 pc 0000000000001094  /data/coredumptest
        #05 pc 0000000000000d78  /data/coredumptest (main+40)
        #06 pc 0000000000013474  /system/lib64/libc.so (__libc_init+100)
        #07 pc 0000000000000e8c  /data/coredumptest
    

    use addr2line to analyze call stacks in tombstone
    We could use addr2line to transform symbol address to source code function name and line number.

    $ aarch64-linux-android-addr2line -e symbols/system/bin/coredumptest -a 0000000000000ef8
    _Z5test4v
    0x0000000000000ef8
    frameworks/native/services/coredumptest/CoredumpTest.cpp:10
    $ aarch64-linux-android-addr2line -e symbols/system/bin/coredumptest -a 0000000000000f84
    0x0000000000000f84
    _Z5test3v
    frameworks/native/services/coredumptest/CoredumpTest.cpp:20
    $ aarch64-linux-android-addr2line -e symbols/system/bin/coredumptest -a 000000000000100c
    0x000000000000100c
    _Z5test2v
    frameworks/native/services/coredumptest/CoredumpTest.cpp:29
    $ aarch64-linux-android-addr2line -f -e symbols/system/bin/coredumptest -a 0000000000001094
    0x0000000000001094
    _Z5test1v
    frameworks/native/services/coredumptest/CoredumpTest.cpp:38
    $ aarch64-linux-android-addr2line -f -e symbols/system/bin/coredumptest -a 0000000000000d78       
    0x0000000000000d78
    main
    frameworks/native/services/coredumptest/CoredumpTest.cpp:56
    

    review source code to see why the native crash happens
    From source code, we could find that the native crash is due to dereferencing NULL pointer.

    #define LOG_TAG "CoredumpTest"
    
    #include <utils/Log.h>
    #include <string.h>
    #include <sys/resource.h>
    
    using namespace android;
    
    int test4()
    {
        int ret = strlen(NULL);
    
        ALOGD("enter %s: %d", __func__,  ret);
    
        return ret;
    }
    
    int test3()
    {
        int ret = test4() + 3;
    
        ALOGD("enter %s: %d", __func__, ret);
    
        return ret;
    }
    
    int test2()
    {
        int ret = test3() + 2;
    
        ALOGD("enter %s: %d", __func__, ret);
    
        return ret;
    }
    
    int test1()
    {
        int ret = test2() + 1;
    
        ALOGD("enter %s: %d", __func__, ret);
    
        return ret;
    }
    
    int main()
    {
        struct rlimit core_limit;
        core_limit.rlim_cur = RLIM_INFINITY;
        core_limit.rlim_max = RLIM_INFINITY;
    
        if (setrlimit(RLIMIT_CORE, &core_limit) < 0) {
            ALOGD("Failed to setrlimit: %s", strerror(errno));
            return 1;
        }
    
        int n = test1();
        ALOGD("Ready to enter test");
    
        return 0;
    }
    

    conclusion
    After a process hit native crash, we could analyze the call stack of the process from core file or tombstone. Then, review the source code to see why the crash happens.

    android: coredump: analyze core file with gdb

    October 27, 2015

    In android: coredump; how to make kernel dump core file after some process crashes, we discuss how to get core file of a process after it native crashes. In this post, we discuss how to analyze the core file with gdb.

    run gdb and setup environment

  • star gdb
  • $ cd ~/android_source/
    $ ./prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/bin//aarch64-linux-android-gdb
    GNU gdb (GDB) 7.7
    Copyright (C) 2014 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "--host=x86_64-linux-gnu --target=aarch64-elf-linux".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://source.android.com/source/report-bugs.html>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word".
    
  • setup file
  • (gdb) file out/target/product/${project}/symbols/system/bin/coredumptest
    Reading symbols from symbols/system/bin/coredumptest...done.
    
  • setup dynamic library search path
  • (gdb) set solib-search-path out/target/product/${project}/symbols/system/lib64/
    
  • setup core-file
  • (gdb) core-file 20948.coredumptest.18446744073709551615.core
    [New LWP 20948]
    warning: Could not load shared library symbols for 4 libraries, e.g. /system/bin/linker64.
    Use the "info sharedlibrary" command to see the complete listing.
    Do you need "set solib-search-path" or "set sysroot"?
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  strlen () at bionic/libc/arch-arm64/generic/bionic/strlen.S:71
    warning: Source file is more recent than executable.
    71              ldp     data1, data2, [src], #16
    

    dump stack of the crashing thread

  • Use backtrace(bt) command to get callback
  • (gdb) bt
    #0  strlen () at bionic/libc/arch-arm64/generic/bionic/strlen.S:71
    #1  0x0000005595326f00 in strlen (s=0x0) at bionic/libc/include/string.h:239
    #2  test4 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:11
    #3  0x0000005595326f88 in test3 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:20
    #4  0x0000005595327010 in test2 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:29
    #5  0x0000005595327098 in test1 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:38
    #6  0x0000005595326d7c in main () at frameworks/native/services/coredumptest/CoredumpTest.cpp:56
    
  • The gdb call stack satisfies the code flow of the native crash in coredumptest
  • int test4()
    {
        int ret = strlen(NULL);
        ......
        return ret;
    }
    
    int test3()
    {
        int ret = test4() + 3;
        ......
        return ret;
    }
    
    int test2()
    {
        int ret = test3() + 2;
        ......
        return ret;
    }
    
    int test1()
    {
        int ret = test2() + 1;
        ......
        return ret;
    }
    
    int main()
    {
        ......
        int n = test1();
        ......
        return 0;
    }
    

    dump registers of the crashing thread
    Use info command to get the values of all registers

    (gdb) info registers
    x0             0x0      0
    x1             0x0      0
    x2             0x7fcbe33358     548881511256
    x3             0xa      10
    x4             0x1      1
    x5             0x0      0
    x6             0xb      11
    x7             0x0      0
    x8             0xa4     164
    x9             0x0      0
    x10            0x7fcbe32f88     548881510280
    x11            0x101010101010101        72340172838076673
    x12            0x1      1
    x13            0x1e     30
    x14            0x7faa6560f0     548319617264
    x15            0x7faa656100     548319617280
    x16            0x5595338fb8     367575404472
    x17            0x7faa597424     548318835748
    x18            0x0      0
    x19            0xffffffffffffffff       -1
    x20            0x7fcbe33348     548881511240
    x21            0x1      1
    x22            0x5595326d50     367575330128
    x23            0x0      0
    x24            0x0      0
    x25            0x0      0
    x26            0x0      0
    x27            0x0      0
    x28            0x0      0
    x29            0x7fcbe33220     548881510944
    x30            0x5595326f00     367575330560
    sp             0x7fcbe33220     0x7fcbe33220
    pc             0x7faa597434     0x7faa597434 <strlen+16>
    cpsr           0x40000000       1073741824
    fpsr           0x0      0
    fpcr           0x0      0
    

    dump memory near the stack of the crashing thread

    (gdb) x/100x 0x7fcbe33220
    0x7fcbe33220:   0xcbe33240      0x0000007f      0x95326f88      0x00000055
    0x7fcbe33230:   0xffffffff      0xffffffff      0xcbe33348      0x0000007f
    0x7fcbe33240:   0xcbe33260      0x0000007f      0x95327010      0x00000055
    0x7fcbe33250:   0xffffffff      0xffffffff      0xaa63e198      0x0000007f
    0x7fcbe33260:   0xcbe33280      0x0000007f      0x95327098      0x00000055
    0x7fcbe33270:   0xffffffff      0xffffffff      0xaa596468      0x0000007f
    0x7fcbe33280:   0xcbe332a0      0x0000007f      0x95326d7c      0x00000055
    0x7fcbe33290:   0xffffffff      0xffffffff      0x95326d50      0x00000055
    0x7fcbe332a0:   0xcbe332d0      0x0000007f      0xaa596478      0x0000007f
    0x7fcbe332b0:   0xcbe33358      0x0000007f      0x00000000      0x00000000
    0x7fcbe332c0:   0xffffffff      0xffffffff      0xffffffff      0xffffffff
    0x7fcbe332d0:   0xcbe33300      0x0000007f      0x95326e90      0x00000055
    0x7fcbe332e0:   0x00000000      0x00000000      0x00000000      0x00000000
    0x7fcbe332f0:   0x00000000      0x00000000      0x00000000      0x00000000
    0x7fcbe33300:   0x00000000      0x00000000      0xaa63f610      0x0000007f
    0x7fcbe33310:   0x00000000      0x00000000      0xcbe33340      0x0000007f
    0x7fcbe33320:   0x00000000      0x00000000      0x95338ce0      0x00000055
    0x7fcbe33330:   0x95338cf0      0x00000055      0x95338d00      0x00000055
    0x7fcbe33340:   0x00000001      0x00000000      0xcbe33a3a      0x0000007f
    0x7fcbe33350:   0x00000000      0x00000000      0xcbe33a4d      0x0000007f
    0x7fcbe33360:   0xcbe33a62      0x0000007f      0xcbe33a8e      0x0000007f
    0x7fcbe33370:   0xcbe33aa1      0x0000007f      0xcbe33d67      0x0000007f
    0x7fcbe33380:   0xcbe33da4      0x0000007f      0xcbe33dbd      0x0000007f
    0x7fcbe33390:   0xcbe33ddf      0x0000007f      0xcbe33dfe      0x0000007f
    0x7fcbe333a0:   0xcbe33e13      0x0000007f      0xcbe33e3d      0x0000007f
    

    conclusion
    In this post, we demonstrate how to run and setup gdb to load corefile. We also demonstrate basic gdb commands to show call stacks, registers, and some memory contents.

    android: coredump; how to make kernel dump core file after some process crashes

    October 27, 2015

    This post is to discuss how to make kernel dump core file after some process crashes. The testing code base is android-5.0.2_r1 . In android: coredump: analyze core file with gdb, we further demonstrates how to use gdb to analyze core files.

    enable coredump in kernel
    Enable CONFIG_COREDUMP while building kernel image.

    specify core file absolute path and name

    $ adb shell "echo /data/core/%p.%e.core > /proc/sys/kernel/core_pattern"
    
    1. %p indicates pid of the dumped process
    2. %e indicates the executable name

    set core file size for the process
    If the resource limit of core file size is 0, then kernel wouldn’t dump core file of this process after terminating. Thus, the process needs to set core file size if its resource limit is 0.

    int main()
    {
        struct rlimit core_limit;
        core_limit.rlim_cur = RLIM_INFINITY;
        core_limit.rlim_max = RLIM_INFINITY;
    
        if (setrlimit(RLIMIT_CORE, &core_limit) < 0) {
            ALOGD("Failed to setrlimit: %s", strerror(errno));
            return 1;
        }
    
        int n = test1();
        ALOGD("Ready to enter test");
    
        return 0;
    }
    

    an native process example which has core file after it crashes
    The native example coredumptest set resource limit of core file size as infinity. It then deferences NULL pointer and hit native crash.

    int test4()
    {
        int ret = strlen(NULL);
    
        ALOGD("enter %s: %d", __func__,  ret);
    
        return ret;
    }
    

    The core file is dumped under /data/core. In this case, 20577 is pid and coredumptest is the name of the executable file.

    $ adb shell ls /data/core
    20577.coredumptest.core
    

    conclusion
    This post shows how to make kernel dump core file after some process crashes. We also give an example coredumptest which will crash immediately and have core files if kernel supports coredump.

    reference

    1. Turn on core/crash dumps programatically
    2. Changing location of core dump

    android: binder: error log: binder_alloc_buf, no vma

    October 19, 2015

    This post is to discuss binder error log: binder_alloc_buf, no vma. The reference code base is android kernel 3.4. The log is borrowed from https://community.freescale.com/thread/342488.

    symptom: after process pid=357 crashes, lots of binder: 3057: binder_alloc_buf, no vma logs show up

    [  174.653668] init: untracked pid 3113 exited
    [  174.659857] init: untracked pid 3262 exited
    [  174.665214] init: untracked pid 3279 exited
    [  174.670170] init: untracked pid 3462 exited
    [  174.674735] init: untracked pid 3538 exited
    [  174.679930] init: untracked pid 3057 exited
    [  174.759346] init: untracked pid 3870 exited
    [  174.839368] alarm_release: clear alarm, pending 0
    [  174.844092] alarm_release: clear alarm, pending 0
    [  175.003312] binder: 3057: binder_alloc_buf, no vma
    [  175.008343] binder: 4008:4104 transaction failed 29201, size 124-4
    [  175.015018] binder: 3057: binder_alloc_buf, no vma
    [  175.019899] binder: 4008:4104 transaction failed 29201, size 124-4
    

    analysis
    In android: binder: resources , we discuss that binder_vma and binder fd are both released within do_exit which is executed when a process crashes. Among these resources, binder_vma is released before binder fd is released. Thus, it’s very common to see these logs while a process crashes.

    The log shows that the thread 4008:4104 tries to initiate a binder transaction and allocate a binder buffer within 3057’s binder_vma. However, 3057 is doing do_exit and has already released binder_vma but has not released binder fd, yet. Thus, 4008:4104 could initialise a binder transaction but fails to allocate binder buffer in 3057’s binder_vma, and get BR_FAILED_REPLY.

    static struct binder_buffer *binder_alloc_buf(struct binder_proc *proc,
    					      size_t data_size,
    					      size_t offsets_size, int is_async)
    {
    	struct rb_node *n = proc->free_buffers.rb_node;
    	struct binder_buffer *buffer;
    	size_t buffer_size;
    	struct rb_node *best_fit = NULL;
    	void *has_page_addr;
    	void *end_page_addr;
    	size_t size;
    	if (proc->vma == NULL) {
    		printk(KERN_ERR "binder: %d: binder_alloc_buf, no vma\n",
    		       proc->pid);
    		return NULL;
    	}
    ......
    }
    
    static void binder_transaction(struct binder_proc *proc,
    			       struct binder_thread *thread,
    			       struct binder_transaction_data *tr, int reply)
    {
    ......
    	t->buffer = binder_alloc_buf(target_proc, tr->data_size,
    		tr->offsets_size, !reply && (t->flags & TF_ONE_WAY));
    	if (t->buffer == NULL) {
    		return_error = BR_FAILED_REPLY;
    		goto err_binder_alloc_buf_failed;
    	}
    ......
    err_binder_alloc_buf_failed:
    	kfree(tcomplete);
    	binder_stats_deleted(BINDER_STAT_TRANSACTION_COMPLETE);
    err_alloc_tcomplete_failed:
    	kfree(t);
    	binder_stats_deleted(BINDER_STAT_TRANSACTION);
    err_alloc_t_failed:
    err_bad_call_stack:
    err_empty_call_stack:
    err_dead_binder:
    err_invalid_target_handle:
    err_no_context_mgr_node:
    	binder_debug(BINDER_DEBUG_FAILED_TRANSACTION,
    		     "binder: %d:%d transaction failed %d, size %zd-%zd\n",
    		     proc->pid, thread->pid, return_error,
    		     tr->data_size, tr->offsets_size);
    	{
    ......
    }
    

    conclusion
    After a process crashes, it’s very common to see these logs since binder_vma has already been released. But if these log happen for a long time, it’s really an issue and we’ll discuss in another post.

    android: binder: resources

    October 19, 2015

    This post is to discuss the resources used by binder mechanisms, including when these resources are requested and released. The reference source code is hello service in android-5.0.2_r1 .

    binder server: open /dev/binder

  • alloc one binder fd
  • The ProcessState constructor opens /dev/binder. This operations requires a fd resources. By default, each process has 1024 fds. Since each binder server requests opening /dev/binder at very early stage, it seldom fails to get the fd.

    int main()
    {
        LOGD("HelloServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("hello"), new HelloService);
    
        LOGD("Successfully register service: hello");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    binder server: mmap (2MB-8KB) of /dev/binder file

  • alloc 2MB physical memory
  • alloc 2MB user space virtual address space, i.e., binder vma
  • alloc 2MB kernel vmalloc virtual address space
  • The ProcessState constructor also mmaps about 2MB of the file returned by opening /dev/binder. This mmap allocates 2MB physical memory which is mapped by user space virtual address space and vmalloc address space.

    ProcessState::ProcessState()
        : mDriverFD(open_driver())
        , mVMStart(MAP_FAILED)
        , mManagesContexts(false)
        , mBinderContextCheckFunc(NULL)
        , mBinderContextUserData(NULL)
        , mThreadPoolStarted(false)
        , mThreadPoolSeq(1)
    {
        if (mDriverFD >= 0) {
            // XXX Ideally, there should be a specific define for whether we
            // have mmap (or whether we could possibly have the kernel module
            // availabla).
    #if !defined(HAVE_WIN32_IPC)
            // mmap the binder, providing a chunk of virtual address space to receive transactions.
            mVMStart = mmap(0, BINDER_VM_SIZE, PROT_READ, MAP_PRIVATE | MAP_NORESERVE, mDriverFD, 0);
            if (mVMStart == MAP_FAILED) {
                // *sigh*
                ALOGE("Using /dev/binder failed: unable to mmap transaction memory.\n");
                close(mDriverFD);
                mDriverFD = -1;
            }
    #else
            mDriverFD = -1;
    #endif
        }
        LOG_ALWAYS_FATAL_IF(mDriverFD < 0, "Binder driver could not be opened.  Terminating.");
    }
    

    binder server: addService

  • a binder service with unique name
  • Each binder service has unique name. You could use below command to check all registered services.

    $ adb shell service list
    
    int main()
    {
        LOGD("HelloServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("hello"), new HelloService);
    
        LOGD("Successfully register service: hello");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    binder client: getService
    getService get a BpService of the the requested binder service.

    int main()
    {
        sp<IHelloService> service = interface_cast<IHelloService>(defaultServiceManager()->getService(String16("hello")));
    
        LOGD("service->add(4, 3) = %d", service->add(4, 3));
        LOGD("service->sub(4, 3) = %d", service->sub(4, 3));
    
        return 0;
    }
    

    binder client: request a binder transaction

  • alloc a binder buffer within the server’s binder vma whose size is 2MB
  • Each binder transaction requires a binder buffer in the server’s binder vma. The size of the binder buffers of all binder transactions is at most the total binder’s vma. However, the size of the binder buffers of all asynchronous transaction is at most half binder’s vma.

    static int binder_mmap(struct file *filp, struct vm_area_struct *vma)
    {
    ......
            proc->buffer_size = vma->vm_end - vma->vm_start;
    ......
            proc->free_async_space = proc->buffer_size / 2;
    ......
    }
    

    binder server: execute the binder transaction

  • release the binder buffer allocated by client
  • binder server: exit

  • release the 2MB binder vma
  • release the 2MB vmalloc address space
  • release the 2MB physical memory
  • release the binder fd
  • In android: binder: error log: binder_alloc_buf, no vma , we discuss an error logs while binder_vma is not available.

    conclusion
    In this post, we go through each stage of binder mechanism. Elaborate what resources are need in each stage, and discuss the situations when resources art not available.

  • android: binder: native binder example: client sends a callback to server, then server triggers this callback

    October 16, 2015

    This post demonstrates an example in which client sends a callback to server, then server triggers this callback. The test code base is Android 5.0.2 LRX22G.

    source code
    The timerservice-and-callbackservice has four modules: libtimerservice, libcallbackservice, timerclient, and timerserver. Below explains the functionality of each module.

    libtimerservice
    This module declares and implements timer service which provides API registerCallback.

    The class ITimerService declares the API registerCallback.

    class ITimerService: public IInterface {
    public:
        DECLARE_META_INTERFACE(TimerService);
    
        virtual void registerCallback(sp<IBinder>& binder, int timeout) = 0;
    };
    

    The class TimerService implements the API registerCallback.

    void TimerService::registerCallback(sp<IBinder>& binder, int timeout)
    {
        Mutex::Autolock _l(mLock);
    
        sp<ICallbackService> callback = interface_cast<ICallbackService>(binder);
        mCallbacks.push(callback);
        mTimeouts.add(callback, timeout);
        LOGD("Service does registerCallback timeout = 30");
    }
    

    libcallbackservice
    This module declares and implements callback service which provides API execute.

    The class ICallbackService declares the API execute.

    class ICallbackService: public IInterface {
    public:
        DECLARE_META_INTERFACE(CallbackService);
    
        virtual void execute() = 0;
    };
    

    The class CallbackService implements the API execute.

    void CallbackService::execute()
    {
        Mutex::Autolock _l(mLock);
    
        LOGD("callback is executed");
        mCondition.signal();
    }
    

    timerserver
    timerserver register timer srevice.

    int main()
    {
        LOGD("TimerServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("timer"), new TimerService);
    
        LOGD("Successfully register service: timer");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    timerclient
    timerclient gets timer service, sends a callback to timer service, and waits for the TimerServiceThread of timerserver to trigger the callback. After a binder thread of timerclient executes the callback, the main thread of timerclient exits the process.

    int main()
    {
        LOGD("TimerClient is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        sp<ITimerService> service = interface_cast<ITimerService>(defaultServiceManager()->getService(String16("timer")));
        LOGD("Successfully get timer service");
    
        sp<CallbackService> callback(new CallbackService);
        sp<IBinder> binder(callback.get());
    
        LOGD("Request to registerCallback, timeout = 30");
        service->registerCallback(binder, 30);
    
        {
            Mutex::Autolock _l(callback->mLock);
            LOGD("Wait callback to be executed");
            callback->mCondition.wait(callback->mLock);
        }
        LOGD("Callback has already been executed. Ready to exit");
    
        return 0;
    }
    

    log of running timerserver and timerclient
    10-15 18:07:41.724 31646 31646 D TimerServer: TimerServer is starting
    10-15 18:07:41.724 31646 31646 D TimerServer: Successfully register service: timer
    10-15 18:07:41.724 31646 31646 D TimerServer: Ready to joinThreadPool
    10-15 18:07:43.794 31652 31652 D TimerClient: TimerClient is starting
    10-15 18:07:43.804 31652 31652 D TimerClient: Successfully get timer service
    10-15 18:07:43.804 31652 31652 D TimerClient: Request to registerCallback, timeout = 30
    10-15 18:07:43.804 31646 31648 D TimerService: Service does registerCallback timeout = 30
    10-15 18:07:43.804 31652 31652 D TimerClient: Wait callback to be executed
    10-15 18:08:13.744 31646 31649 D TimerService: Request callback to execute, timeout = 30
    10-15 18:08:13.744 31652 31654 D CallbackService: callback is executed
    10-15 18:08:13.744 31652 31652 D TimerClient: Callback has already been executed. Ready to exit

    sequence diagram of running timerserver and timerclient

    binder_timerservice_01

    conclusion
    This post demonstrates an example in which client sends a callback to server, then server triggers this callback. At the end shows the log and the sequence diagram of running this example.


    %d bloggers like this: