Archive for the ‘binder’ Category

android: binder: error log: binder_alloc_buf, no vma

October 19, 2015

This post is to discuss binder error log: binder_alloc_buf, no vma. The reference code base is android kernel 3.4. The log is borrowed from https://community.freescale.com/thread/342488.

symptom: after process pid=357 crashes, lots of binder: 3057: binder_alloc_buf, no vma logs show up

[  174.653668] init: untracked pid 3113 exited
[  174.659857] init: untracked pid 3262 exited
[  174.665214] init: untracked pid 3279 exited
[  174.670170] init: untracked pid 3462 exited
[  174.674735] init: untracked pid 3538 exited
[  174.679930] init: untracked pid 3057 exited
[  174.759346] init: untracked pid 3870 exited
[  174.839368] alarm_release: clear alarm, pending 0
[  174.844092] alarm_release: clear alarm, pending 0
[  175.003312] binder: 3057: binder_alloc_buf, no vma
[  175.008343] binder: 4008:4104 transaction failed 29201, size 124-4
[  175.015018] binder: 3057: binder_alloc_buf, no vma
[  175.019899] binder: 4008:4104 transaction failed 29201, size 124-4

analysis
In android: binder: resources , we discuss that binder_vma and binder fd are both released within do_exit which is executed when a process crashes. Among these resources, binder_vma is released before binder fd is released. Thus, it’s very common to see these logs while a process crashes.

The log shows that the thread 4008:4104 tries to initiate a binder transaction and allocate a binder buffer within 3057’s binder_vma. However, 3057 is doing do_exit and has already released binder_vma but has not released binder fd, yet. Thus, 4008:4104 could initialise a binder transaction but fails to allocate binder buffer in 3057’s binder_vma, and get BR_FAILED_REPLY.

static struct binder_buffer *binder_alloc_buf(struct binder_proc *proc,
					      size_t data_size,
					      size_t offsets_size, int is_async)
{
	struct rb_node *n = proc->free_buffers.rb_node;
	struct binder_buffer *buffer;
	size_t buffer_size;
	struct rb_node *best_fit = NULL;
	void *has_page_addr;
	void *end_page_addr;
	size_t size;
	if (proc->vma == NULL) {
		printk(KERN_ERR "binder: %d: binder_alloc_buf, no vma\n",
		       proc->pid);
		return NULL;
	}
......
}
static void binder_transaction(struct binder_proc *proc,
			       struct binder_thread *thread,
			       struct binder_transaction_data *tr, int reply)
{
......
	t->buffer = binder_alloc_buf(target_proc, tr->data_size,
		tr->offsets_size, !reply && (t->flags & TF_ONE_WAY));
	if (t->buffer == NULL) {
		return_error = BR_FAILED_REPLY;
		goto err_binder_alloc_buf_failed;
	}
......
err_binder_alloc_buf_failed:
	kfree(tcomplete);
	binder_stats_deleted(BINDER_STAT_TRANSACTION_COMPLETE);
err_alloc_tcomplete_failed:
	kfree(t);
	binder_stats_deleted(BINDER_STAT_TRANSACTION);
err_alloc_t_failed:
err_bad_call_stack:
err_empty_call_stack:
err_dead_binder:
err_invalid_target_handle:
err_no_context_mgr_node:
	binder_debug(BINDER_DEBUG_FAILED_TRANSACTION,
		     "binder: %d:%d transaction failed %d, size %zd-%zd\n",
		     proc->pid, thread->pid, return_error,
		     tr->data_size, tr->offsets_size);
	{
......
}

conclusion
After a process crashes, it’s very common to see these logs since binder_vma has already been released. But if these log happen for a long time, it’s really an issue and we’ll discuss in another post.

Advertisements

android: binder: resources

October 19, 2015

This post is to discuss the resources used by binder mechanisms, including when these resources are requested and released. The reference source code is hello service in android-5.0.2_r1 .

binder server: open /dev/binder

  • alloc one binder fd
  • The ProcessState constructor opens /dev/binder. This operations requires a fd resources. By default, each process has 1024 fds. Since each binder server requests opening /dev/binder at very early stage, it seldom fails to get the fd.

    int main()
    {
        LOGD("HelloServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("hello"), new HelloService);
    
        LOGD("Successfully register service: hello");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    binder server: mmap (2MB-8KB) of /dev/binder file

  • alloc 2MB physical memory
  • alloc 2MB user space virtual address space, i.e., binder vma
  • alloc 2MB kernel vmalloc virtual address space
  • The ProcessState constructor also mmaps about 2MB of the file returned by opening /dev/binder. This mmap allocates 2MB physical memory which is mapped by user space virtual address space and vmalloc address space.

    ProcessState::ProcessState()
        : mDriverFD(open_driver())
        , mVMStart(MAP_FAILED)
        , mManagesContexts(false)
        , mBinderContextCheckFunc(NULL)
        , mBinderContextUserData(NULL)
        , mThreadPoolStarted(false)
        , mThreadPoolSeq(1)
    {
        if (mDriverFD >= 0) {
            // XXX Ideally, there should be a specific define for whether we
            // have mmap (or whether we could possibly have the kernel module
            // availabla).
    #if !defined(HAVE_WIN32_IPC)
            // mmap the binder, providing a chunk of virtual address space to receive transactions.
            mVMStart = mmap(0, BINDER_VM_SIZE, PROT_READ, MAP_PRIVATE | MAP_NORESERVE, mDriverFD, 0);
            if (mVMStart == MAP_FAILED) {
                // *sigh*
                ALOGE("Using /dev/binder failed: unable to mmap transaction memory.\n");
                close(mDriverFD);
                mDriverFD = -1;
            }
    #else
            mDriverFD = -1;
    #endif
        }
        LOG_ALWAYS_FATAL_IF(mDriverFD < 0, "Binder driver could not be opened.  Terminating.");
    }
    

    binder server: addService

  • a binder service with unique name
  • Each binder service has unique name. You could use below command to check all registered services.

    $ adb shell service list
    
    int main()
    {
        LOGD("HelloServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("hello"), new HelloService);
    
        LOGD("Successfully register service: hello");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    binder client: getService
    getService get a BpService of the the requested binder service.

    int main()
    {
        sp<IHelloService> service = interface_cast<IHelloService>(defaultServiceManager()->getService(String16("hello")));
    
        LOGD("service->add(4, 3) = %d", service->add(4, 3));
        LOGD("service->sub(4, 3) = %d", service->sub(4, 3));
    
        return 0;
    }
    

    binder client: request a binder transaction

  • alloc a binder buffer within the server’s binder vma whose size is 2MB
  • Each binder transaction requires a binder buffer in the server’s binder vma. The size of the binder buffers of all binder transactions is at most the total binder’s vma. However, the size of the binder buffers of all asynchronous transaction is at most half binder’s vma.

    static int binder_mmap(struct file *filp, struct vm_area_struct *vma)
    {
    ......
            proc->buffer_size = vma->vm_end - vma->vm_start;
    ......
            proc->free_async_space = proc->buffer_size / 2;
    ......
    }
    

    binder server: execute the binder transaction

  • release the binder buffer allocated by client
  • binder server: exit

  • release the 2MB binder vma
  • release the 2MB vmalloc address space
  • release the 2MB physical memory
  • release the binder fd
  • In android: binder: error log: binder_alloc_buf, no vma , we discuss an error logs while binder_vma is not available.

    conclusion
    In this post, we go through each stage of binder mechanism. Elaborate what resources are need in each stage, and discuss the situations when resources art not available.

  • android: binder: native binder example: client sends a callback to server, then server triggers this callback

    October 16, 2015

    This post demonstrates an example in which client sends a callback to server, then server triggers this callback. The test code base is Android 5.0.2 LRX22G.

    source code
    The timerservice-and-callbackservice has four modules: libtimerservice, libcallbackservice, timerclient, and timerserver. Below explains the functionality of each module.

    libtimerservice
    This module declares and implements timer service which provides API registerCallback.

    The class ITimerService declares the API registerCallback.

    class ITimerService: public IInterface {
    public:
        DECLARE_META_INTERFACE(TimerService);
    
        virtual void registerCallback(sp<IBinder>& binder, int timeout) = 0;
    };
    

    The class TimerService implements the API registerCallback.

    void TimerService::registerCallback(sp<IBinder>& binder, int timeout)
    {
        Mutex::Autolock _l(mLock);
    
        sp<ICallbackService> callback = interface_cast<ICallbackService>(binder);
        mCallbacks.push(callback);
        mTimeouts.add(callback, timeout);
        LOGD("Service does registerCallback timeout = 30");
    }
    

    libcallbackservice
    This module declares and implements callback service which provides API execute.

    The class ICallbackService declares the API execute.

    class ICallbackService: public IInterface {
    public:
        DECLARE_META_INTERFACE(CallbackService);
    
        virtual void execute() = 0;
    };
    

    The class CallbackService implements the API execute.

    void CallbackService::execute()
    {
        Mutex::Autolock _l(mLock);
    
        LOGD("callback is executed");
        mCondition.signal();
    }
    

    timerserver
    timerserver register timer srevice.

    int main()
    {
        LOGD("TimerServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("timer"), new TimerService);
    
        LOGD("Successfully register service: timer");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    timerclient
    timerclient gets timer service, sends a callback to timer service, and waits for the TimerServiceThread of timerserver to trigger the callback. After a binder thread of timerclient executes the callback, the main thread of timerclient exits the process.

    int main()
    {
        LOGD("TimerClient is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        sp<ITimerService> service = interface_cast<ITimerService>(defaultServiceManager()->getService(String16("timer")));
        LOGD("Successfully get timer service");
    
        sp<CallbackService> callback(new CallbackService);
        sp<IBinder> binder(callback.get());
    
        LOGD("Request to registerCallback, timeout = 30");
        service->registerCallback(binder, 30);
    
        {
            Mutex::Autolock _l(callback->mLock);
            LOGD("Wait callback to be executed");
            callback->mCondition.wait(callback->mLock);
        }
        LOGD("Callback has already been executed. Ready to exit");
    
        return 0;
    }
    

    log of running timerserver and timerclient
    10-15 18:07:41.724 31646 31646 D TimerServer: TimerServer is starting
    10-15 18:07:41.724 31646 31646 D TimerServer: Successfully register service: timer
    10-15 18:07:41.724 31646 31646 D TimerServer: Ready to joinThreadPool
    10-15 18:07:43.794 31652 31652 D TimerClient: TimerClient is starting
    10-15 18:07:43.804 31652 31652 D TimerClient: Successfully get timer service
    10-15 18:07:43.804 31652 31652 D TimerClient: Request to registerCallback, timeout = 30
    10-15 18:07:43.804 31646 31648 D TimerService: Service does registerCallback timeout = 30
    10-15 18:07:43.804 31652 31652 D TimerClient: Wait callback to be executed
    10-15 18:08:13.744 31646 31649 D TimerService: Request callback to execute, timeout = 30
    10-15 18:08:13.744 31652 31654 D CallbackService: callback is executed
    10-15 18:08:13.744 31652 31652 D TimerClient: Callback has already been executed. Ready to exit

    sequence diagram of running timerserver and timerclient

    binder_timerservice_01

    conclusion
    This post demonstrates an example in which client sends a callback to server, then server triggers this callback. At the end shows the log and the sequence diagram of running this example.

    android: binder: native binder service example: simple.account service

    October 4, 2015

    This post demonstrates a native binder service example: simple.account service. This service supports three APIs: addAccount, removeAccount, and listAccounts. The reference code base is Android 5.0.2 LRX22G.

    simple.account service
    The simple.account service has three modules: libaccountservice, accountserver, and accountclient. Below explains the functionality of each module.

    libaccountservice
    This module declares and implements simple.account service which supports three APIs: addAccount, removeAccount, and listAccounts.

    class IAccountService declares the interfaces of simple.account service. Three APIs, addAccount, removeAccount, and listAccounts are declared.

    class IAccountService: public IInterface {
    public:
        DECLARE_META_INTERFACE(AccountService);
    
        virtual int addAccount(const String16&) = 0;
        virtual int removeAccount(const String16&) = 0;
        virtual Vector<String16> listAccounts() = 0;
    };
    

    class AccountService inherits class BnAccountService which inherits class BnInterface<IAccountService>. class AccountServices implements APIs.

    int AccountService::addAccount(const String16& name)
    {
        for (size_t i = 0; i < mAccounts.size(); i++) {
            if (mAccounts[i] == name)
                return -1;
        }
    
        mAccounts.push(name);
    
        return 0;
    }
    
    int AccountService::removeAccount(const String16& name)
    {
        int index = -1;
    
        for (size_t i = 0; i < mAccounts.size(); i++) {
            if (mAccounts.itemAt(i) == name) {
                index = i;
                break;
            }
        }
    
        if (index < 0)
            return -1;
    
        mAccounts.removeAt(index);
    
        return 0;
    }
    
    Vector<String16> AccountService::listAccounts()
    {
        return mAccounts;
    }
    

    accountserver
    accountserver registers simple.account service.

    int main()
    {
        LOGD("AccountServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("simple.account"), new AccountService);
    
        LOGD("Successfully register service: simple.account");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    accountclient
    accountclient gets simple.account service and calls APIs.

    void showAccounts(const Vector<String16>& accounts)
    {
        LOGD("There are %d accounts", accounts.size());
    
        for (size_t i = 0; i < accounts.size(); i++)
            LOGD("account %d is %s", i + 1, String8(accounts[i]).string());
    }
    
    int main()
    {
        sp<IAccountService> service = interface_cast<IAccountService>(defaultServiceManager()->getService(String16("simple.account")));
    
        service->addAccount(String16("john"));
        showAccounts(service->listAccounts());
    
        service->addAccount(String16("tom"));
        showAccounts(service->listAccounts());
    
        service->removeAccount(String16("john"));
        showAccounts(service->listAccounts());
    
        return 0;
    }
    

    log of running accountserver and accountclient

    10-04 12:35:40.821 13732 13732 D AccountServer: AccountServer is starting
    10-04 12:35:40.821 13732 13732 D AccountServer: Successfully register service: simple.account
    10-04 12:35:40.821 13732 13732 D AccountServer: Ready to joinThreadPool
    10-04 12:35:43.231 13736 13736 D AccountClient: There are 1 accounts
    10-04 12:35:43.231 13736 13736 D AccountClient: account 1 is john
    10-04 12:35:43.231 13736 13736 D AccountClient: There are 2 accounts
    10-04 12:35:43.231 13736 13736 D AccountClient: account 1 is john
    10-04 12:35:43.231 13736 13736 D AccountClient: account 2 is tom
    10-04 12:35:43.241 13736 13736 D AccountClient: There are 1 accounts
    10-04 12:35:43.241 13736 13736 D AccountClient: account 1 is tom
    

    conclusion
    This post demonstrates a native binder service example: simple.account services. At the end shows the log of running this service.

    android: binder: native binder service example: hello service

    October 1, 2015

    This post demonstrates a native binder service example: hello service. This service supports two APIs: add and sub. The reference code base is Android 5.0.2 LRX22G.

    hello service
    The hello service has three modules: libhelloservice, helloserver, and helloclient. Below explains the functionality of each module.

    libhelloservice
    This module declares and implements hello service which supports two APIs: add and sub.

    class IHelloService declares the interfaces of hello service. Two APIs, add and sub, are declared.

    class IHelloService: public IInterface {
    public:
        DECLARE_META_INTERFACE(HelloService);
    
        virtual int add(int , int) = 0;
        virtual int sub(int , int) = 0;
    };
    

    class HelloService inherits class BnHelloService which inherits class BnInterface<IHelloService>. class HelloServices implements add and sub APIs.

    int HelloService::add(int a, int b)
    {
        return a + b;
    }
    
    int HelloService::sub(int a, int b)
    {
        return a - b;
    }
    

    helloserver
    helloserver registers hello service.

    int main()
    {
        LOGD("HelloServer is starting");
    
        sp<ProcessState> proc(ProcessState::self());
        ProcessState::self()->startThreadPool();
    
        defaultServiceManager()->addService(String16("hello"), new HelloService);
    
        LOGD("Successfully register service: hello");
        LOGD("Ready to joinThreadPool");
    
        IPCThreadState::self()->joinThreadPool();
    
        return 0;
    }
    

    helloclient
    helloclient gets hello service and calls add, sub.

    int main()
    {
        sp<IHelloService> service = interface_cast<IHelloService>(defaultServiceManager()->getService(String16("hello")));
    
        LOGD("service->add(4, 3) = %d", service->add(4, 3));
        LOGD("service->sub(4, 3) = %d", service->sub(4, 3));
    
        return 0;
    }
    

    log of running helloserver and helloclient

    10-01 19:13:26.409 14943 14943 D HelloServer: HelloServer is starting
    10-01 19:13:26.409 14943 14943 D HelloServer: Successfully register service: hello
    10-01 19:13:26.409 14943 14943 D HelloServer: Ready to joinThreadPool
    10-01 19:13:29.579 14947 14947 D HelloClient: service->add(4, 3) = 7
    10-01 19:13:29.579 14947 14947 D HelloClient: service->sub(4, 3) = 1
    

    conclusion
    This post demonstrates a native binder service example: hello services. At the end shows the log of running this service.

    android: binder: binder thread encoutners native crash due to bad commands from binder driver

    September 4, 2015

    This post is to discuss a case in which a binder thread encoutners native crash due to bad commands from binder driver. The reference code is qualcomm msm8974 release in which frameworks is KitKat 4.4.2 and the kernel is Linux 3.4.0.

    Symptom: IPCThreadState: *** BAD COMMAND 29190 received from Binder driver, and then the binder thread calls abort
    For this case, I’ll introduce many binder transaction examples, and finally demonstrate in which condition does this case happen.

    06-14 21:46:25.958  1067  1526 E IPCThreadState: *** BAD COMMAND 29190 received from Binder driver
    06-14 21:46:25.958  1067  1526 E IPCThreadState: getAndExecuteCommand(fd=12) returned unexpected error -2147483648, aborting
    06-14 21:46:25.958  1067  1526 F libc    : Fatal signal 6 (SIGABRT) at 0x0000042b (code=-6), thread 1526 (Binder_B)
    
    pid: 1067, tid: 1526, name: Binder_B  >>> system_server <<<
    signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
        r0 00000000  r1 000005f6  r2 00000006  r3 00000000
        r4 00000006  r5 0000000c  r6 000005f6  r7 0000010c
        r8 4019b081  r9 81c4a000  sl 81c49bac  fp 4013f2f4
        ip 00000000  sp 81d47cc0  lr 40103c79  pc 40112c70  cpsr 000f0010
        d0  0000000000000000  d1  0000000000000000
        d2  0000000000000000  d3  0000000000000000
        d4  3a30333030303030  d5  3730303936303020
        d6  3030653630302034  d7  3038363030206636
        d8  0000000000000000  d9  0000000000000000
        d10 0000000000000000  d11 0000000000000000
        d12 0000000000000000  d13 0000000000000000
        d14 0000000000000000  d15 0000000000000000
        d16 6e72757465722029  d17 7078656e75206465
        d18 520a30203a657a69  d19 2064657669656365
        d20 73646e616d6d6f63  d21 7264206d6f726620
        d22 3030203a72657669  d23 3020633032373030
        d24 3f56c16c16c76a94  d25 3f81111111185da8
        d26 3fa555555555551c  d27 3fc55555555554db
        d28 3fe0000000000000  d29 0000000000000001
        d30 fff0000000000000  d31 4000000000000000
        scr 68000011
    
    backtrace:
        #00  pc 00022c70  /system/lib/libc.so (tgkill+12)
        #01  pc 00013c75  /system/lib/libc.so (pthread_kill+48)
        #02  pc 00013e89  /system/lib/libc.so (raise+10)
        #03  pc 00012a23  /system/lib/libc.so
        #04  pc 00022524  /system/lib/libc.so (abort+4)
        #05  pc 00021809  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+124)
        #06  pc 0002658d  /system/lib/libbinder.so
        #07  pc 00012635  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
        #08  pc 0004fdf7  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
        #09  pc 000120b7  /system/lib/libutils.so
        #10  pc 0000d2c0  /system/lib/libc.so (__thread_entry+72)
        #11  pc 0000d458  /system/lib/libc.so (pthread_create+240)
    

    preliminary: example#1: an asynchronous binder transaction

    1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).

    transaction_01

    preliminary: example#2: a synchronous binder transaction

    1. The client thread(3684:3684) requests a synchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).

    transaction_02

    preliminary: example#3: an asynchronous binder transaction which triggers another synchronous binder transaction

    1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).
    2. The client thread(1067:1526) requests a synchronous binder transaction to a binder service provided by the process(387). This transaction is served by the binder thread(387:1008).

    transaction_03

    preliminary: example#4: two asynchronous binder transactions to the same service, each further triggering another binder transaction

    1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the thread(1067:1526).
    2. The client thread(1067:1526) requests a synchronous binder transaction to a binder service provided by the process(387). This transaction is served by the binder thread(387:1008).
    3. The client thread(1726:1726) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the thread(1067:1526).
    4. The client thread(1067:1526) requests an asynchronous binder transaction to a binder service provided by the process(4267). This transaction is served by the binder thread(4267:5488).

    transaction_04

    Why the asynchronous transaction requested by the client thread(1726:1726) is blocked until the asynchronous transaction requested by the client thread(3684:3684) is completed by the binder thread(1067:1526)?

    • Both the client thread(3684:3684) and the client thread(1726:1726) request an asynchronous transaction to the same binder service provided by the process(1067).
    • The asynchronous transaction requested by the client thread(1726:1726) is en-queued into the binder service’s todo list since another asynchronous transaction to the same binder service requested by the client thread(3684:3684) is being served.
    • The en-queued asynchronous transaction is moved into the binder thread(1067:1526)’s todolist after the binder thread(1067:1526) completes the asynchronous transaction requested by client thread(3684:3684).
    • The binder thread(1067:1526) de-queues and executes the asynchronous binder transaction requested by the client thread(1726:1726).

    analysis: what could make this symptom happen
    If a BC_FREE_BUFFER command is sent as below diagram, then the following situation would happen:

    1. The binder thread(1726:1726) receives two BR_TRANSACTION commands consecutively.
    2. The binder thread(1726:1726) requests two BC_TRANSACTION commands consecutively.
    3. The binder thread(1726:1726) receives two BC_TRANSACTION_COMPLETE consecutively.

    This symptom happens while the binder threads receives two BR_TRANSACTION_COMPLETE commands consecutively. So the problem is reduced to why the binder thread(1726:1726) requests BC_FREE_BUFFER unexpectedly.
    transaction_05

    analysis: why the binder thread(1726:1726) requests BC_FREE_BUFFER unexpectedly
    The BC_FREE_BUFFER commands is executed in processPendingDerefs() in which some binder reference is clear and a destructor frees a binder buffer. This bug doesn’t appear in AOSP and pure vendor release.

    void IPCThreadState::joinThreadPool(bool isMain)
    {
        LOG_THREADPOOL("**** THREAD %p (PID %d) IS JOINING THE THREAD POOL\n", (void*)pthread_self(), getpid());
        mOut.writeInt32(isMain ? BC_ENTER_LOOPER : BC_REGISTER_LOOPER);
        
        // This thread may have been spawned by a thread that was in the background
        // scheduling group, so first we will make sure it is in the foreground
        // one to avoid performing an initial transaction in the background.
        set_sched_policy(mMyThreadId, SP_FOREGROUND);
            
        status_t result;
        do {
            processPendingDerefs();
            // now get the next command to be processed, waiting if necessary
            result = getAndExecuteCommand();
            if (result < NO_ERROR && result != TIMED_OUT && result != -ECONNREFUSED && result != -EBADF) {
                ALOGE("getAndExecuteCommand(fd=%d) returned unexpected error %d, aborting",
                      mProcess->mDriverFD, result);
                abort();
            }
            
            // Let this thread exit the thread pool if it is no longer
            // needed and it is not the main process thread.
            if(result == TIMED_OUT && !isMain) {
                break;
            }
        } while (result != -ECONNREFUSED && result != -EBADF);
    

    conclusion
    In IPCThreadState::joinThreadPool(), the message loop of a binder thread, if processPendiDerefs() calls a destructor which calls freeBuffer, then it might send a unexpected BC_FREE_BUFFER command. The unexpected BC_FREE_BUFFER causes two consecutive BR_TRANSACTIONS, two consecutive BC_TRANSACTIONS, and two consecutive BR_TRANSACTION_COMPLETE. Finally, the binder thread will detect a BAD COMMAND and call abort to exit the process.

    android: binder: arm: SIGSEGV at IPCThreadState::executeCommand in libbinder.so

    August 30, 2015

    This post is to discuss a case in which a segmentation native crash happens in binder transactions. The source code is qualcomm release in which frameworks is android-4.4.3_KTU84L KitKat and kernel is inux 3.10.28. The architecture is Quad-core ARM Cortex A53.

    symptom: segmentation at IPCThreadState::executeCommand in libbinder.so

    pid: 308, tid: 464, name: Binder_1  >>> /system/bin/surfaceflinger <<<
    thread: Binder_1
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
        r0 00000000  r1 b70724f4  r2 00000001  r3 00000003
        r4 b6fad0f0  r5 00000000  r6 b6f4538c  r7 000003e8
        r8 00000134  r9 b64d1000  sl bea4d85c  fp b6f432f4
        ip 000003e8  sp b65cec78  lr b6d6acb3  pc b6d676ec  cpsr 000f0030
        d0  6f207463656a6266  d1  64696f76202c6366
        d2  61503a3a64696f73  d3  65723a3a6c656365
        d4  746e69752074736e  d5  6973202c2a745f38
        d6  6f63202c745f657a  d7  657a69732074736e
        d8  0000000000000000  d9  0000000000000000
        d10 0000000000000000  d11 0000000000000000
        d12 0000000000000000  d13 0000000000000000
        d14 0000000000000000  d15 0000000000000000
        d16 41cdcd6500000000  d17 416fca0560000000
        d18 412e848000000000  d19 0000000000000000
        d20 3ff9d07c84b5dcc6  d21 40cf830000000000
        d22 3efa019fabb79d95  d23 3f2a019f8723aeaa
        d24 3f56c16c16c76a94  d25 3f81111111185da8
        d26 40cf830000000000  d27 3fc55555555554db
        d28 3fe0000000000000  d29 0000000000000001
        d30 fff0000000000000  d31 4000000000000000
        scr 68000010
    
    backtrace:
        #00  pc 000206ec  /system/lib/libbinder.so (android::IPCThreadState::executeCommand(int)+587)
        #01  pc 00020a97  /system/lib/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+38)
        #02  pc 00020b0f  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+50)
        #03  pc 00025935  /system/lib/libbinder.so
        #04  pc 00012671  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
        #05  pc 000120f3  /system/lib/libutils.so
        #06  pc 0000d360  /system/lib/libc.so (__thread_entry+72)
        #07  pc 0000d4f8  /system/lib/libc.so (pthread_create+240)
    

    preliminary: binder thread and IPCThreadState
    After opening /dev/binder, a process becomes a binder process and has up to 16 binder threads. Each binder thread has its own thread local IPCThreadState object which could be reference with IPCThreadState::self(). The IPCThreadState:self() provides all the infrastructures for the binder thread to do whatever he or she wants to do.

    After spawned, a binder thread calls joinThreadPool to enter loops. transact initiate a transaction in which executeCommand is carried out. The talkWithDriver is the low level functions to ioctl /dev/binder. Finally, shutdown is to exit a binder thread.

    class IPCThreadState
    {
    public:
        static  IPCThreadState*     self();
        static  IPCThreadState*     selfOrNull();  // self(), but won't instantiate
                sp<ProcessState>    process();
                void                joinThreadPool(bool isMain = true);
                status_t            transact(int32_t handle,
                                             uint32_t code, const Parcel& data,
                                             Parcel* reply, uint32_t flags);
        static  void                shutdown();
    private:
                status_t            talkWithDriver(bool doReceive=true);
                status_t            executeCommand(int32_t command);
                Parcel              mIn;
                Parcel              mOut;
    

    preliminary: how parcel and binder transaction works
    Binder transaction is an IPC transaction. On the client side, one thread of the client process write transaction data into the parcel IPCThreadState()->self()->mOut and requests a binder transaction via ioctl of /dev/binder. This thread will enter kernel space and allocate a binder transaction data whose data is copied from the parcel mOut. Finally, the binder transaction data is en-queued into the todo list of the target process.

    On the server side, a binder thread is blocked at ioctl of /dev/binder. If the todo list of the process is en-queued, then the binder thread will enter kernel space and copy the binder transaction data into the IPCThreadState()->self()->mIn.

    ipcthreadstate_parcel

    (gdb) p mIn      
    $33 = {mError = 0, mData = 0xb6fad198 "\fr", mDataSize = 48, mDataCapacity = 256, mDataPos = 48, mObjects = 0x0, mObjectsSize = 0, mObjectsCapacity = 0, mNextObjectHint = 0, mFdsKnown = true, 
      mHasFds = false, mAllowFds = true, mOwner = 0, mOwnerCookie = 0x2e}
    

    analyze: what binder thread is doing while native crash
    In this case, the binder thread Binder_1(tid=464) of the process /system/bin/surfaceflinger(pid=308) encounters segmentation while executing the command after reading binder transaction from mIn.

    (gdb) bt
    #0  android::IPCThreadState::executeCommand (this=0xb6fad0f0, cmd=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:1088
    #1  0xb6d67a9a in android::IPCThreadState::getAndExecuteCommand (this=0xb6fad0f0) at frameworks/native/libs/binder/IPCThreadState.cpp:436
    #2  0xb6d67b12 in android::IPCThreadState::joinThreadPool (this=0xb6fad0f0, isMain=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:490
    #3  0xb6d6c938 in android::PoolThread::threadLoop (this=0xb6fab648) at frameworks/native/libs/binder/ProcessState.cpp:69
    #4  0xb6e85672 in android::Thread::_threadLoop (user=0xb6fab648) at frameworks/native/libs/utils/Threads.cpp:812
    #5  0xb6e850f4 in thread_data_t::trampoline (t=<optimized out>) at frameworks/native/libs/utils/Threads.cpp:95
    #6  0xb6f00364 in __thread_entry (func=0xb6e850bd <thread_data_t::trampoline(thread_data_t const*)>, arg=0xb6fab698, tls=0xb65cedd0) at bionic/libc/bionic/pthread_create.cpp:105
    #7  0xb6f004fc in pthread_create (thread_out=0xbea4d85c, attr=<optimized out>, start_routine=0xb6e850bd <thread_data_t::trampoline(thread_data_t const*)>, arg=0x78)
        at bionic/libc/bionic/pthread_create.cpp:224
    #8  0x00000000 in ?? ()
    
    status_t IPCThreadState::executeCommand(int32_t cmd)
    {
        BBinder* obj;
        RefBase::weakref_type* refs;
        status_t result = NO_ERROR;
    
        switch (cmd) {
            ......
            ......
        case BR_TRANSACTION:
            {
                binder_transaction_data tr;
                result = mIn.read(&tr, sizeof(tr));
                ALOG_ASSERT(result == NO_ERROR,
                    "Not enough command data for brTRANSACTION");
                if (result != NO_ERROR) break;
    
                Parcel buffer;
                buffer.ipcSetDataReference(
                    reinterpret_cast<const uint8_t*>(tr.data.ptr.buffer),
                    tr.data_size,
                    reinterpret_cast<const size_t*>(tr.data.ptr.offsets),
                    tr.offsets_size/sizeof(size_t), freeBuffer, this);
    
                const pid_t origPid = mCallingPid;
                const uid_t origUid = mCallingUid;
    
                mCallingPid = tr.sender_pid;
                mCallingUid = tr.sender_euid;
                ......
                if (tr.target.ptr) {
                    sp<BBinder> b((BBinder*)tr.cookie);
                    const status_t error = b->transact(tr.code, buffer, &reply, tr.flags);
                    if (error < NO_ERROR) reply.setError(error);
    
                } else {
                    const status_t error = the_context_object->transact(tr.code, buffer, &reply, tr.flags);
                    if (error < NO_ERROR) reply.setError(error);
                }
        }
    }
    
    status_t Parcel::read(void* outData, size_t len) const
    {
        if ((mDataPos+PAD_SIZE(len)) >= mDataPos && (mDataPos+PAD_SIZE(len)) <= mDataSize
                && len <= PAD_SIZE(len)) {
            memcpy(outData, mData+mDataPos, len);
            mDataPos += PAD_SIZE(len);
            ALOGV("read Setting data pos of %p to %d\n", this, mDataPos);
            return NO_ERROR;
        }    
        return NOT_ENOUGH_DATA;
    }
    

    analyze: check if mIn transmitted from binder driver is reasonable
    The parcel mIn of the binder thread Binder_1(tid=464) looks reasonable. The binder request is from tid=10071 of pid=1304. This binder transaction is a two-way transaction and allow replies with file descriptors. The binder thread will execute FIRST_CALL_TRANSACTION + 3 functions of the requested service 0xb7036d20.

    (gdb) x/12x mIn.mData
    0xb6fad198:     0x0000720c      0x80287202      0xb7036d20      0xb70724f4
    0xb6fad1a8:     0x00000003      0x00000010      0x00000518      0x00002757
    0xb6fad1b8:     0x00000064      0x00000000      0xb65cf028      0xb65cf08c
    (gdb) p *((struct binder_transaction_data*)(mIn.mData+0x08))             
    $40 = {target = {handle = 3070455072, ptr = 0xb7036d20}, cookie = 0xb70724f4, code = 3, flags = 16, sender_pid = 1304, sender_euid = 10071, data_size = 100, offsets_size = 0, data = {ptr = {
          buffer = 0xb65cf028, offsets = 0xb65cf08c}, buf = "(\360\\\266\214\360\\\266"}}
    

    binder_transaction_data

    analyze: check if tr copied from mIn.mData+8 is corrupted
    The binder thread Binder_1(tid=464) copies from mIn.mData+8 to local variable tr for 40 bytes. Then it will execute binder transaction and uses tr as the argument. The expected content of is below.
    binder_executecommand_tr_01

    However, coredump shows that tr content is 4 bytes shift of mIn.mData+8. Now the problem reduce to why memcpy copies only 36 bytes with 4 bytes shift while it is supposed to copy 40 bytes data. Also while this case happens, it always appears that dst[3:0] = 4, such as 0xb65cecf4.
    binder_executecommand_tr_02

    (gdb) x/12x mIn.mData
    0xb6fad198:     0x0000720c      0x80287202      0xb7036d20      0xb70724f4
    0xb6fad1a8:     0x00000003      0x00000010      0x00000518      0x00002757
    0xb6fad1b8:     0x00000064      0x00000000      0xb65cf028      0xb65cf08c
    (gdb) x/12x &tr 
    0xb65cecf4:     0x00000000      0xb7036d20      0xb70724f4      0x00000003
    0xb65ced04:     0x00000010      0x00000518      0x00002757      0x00000064
    0xb65ced14:     0x00000000      0xb65cf028      0xac59912d      0xb6fad0f0
    

    preliminary: how bionic memcpy works in coretex-a53

    According to cortex-a53.mk, the implementation of memcpy in cortex-a53 is in bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S.

    include bionic/libc/arch-arm/cortex-a15/cortex-a15.mk
    
  • initial condition
    • R0 is dst of memcpy
      R1 is src of memcpy
      Assume that while this issue happens, dst[3:0] = 4

    memcpy_40bytes_01

  • copy 4 bytes to make r0(dest) 8 byte alignment
  • // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]!
    

    memcpy_40bytes_02

  • copy 8 bytes to make r0 16 byte alignment
  • // Copies 8 bytes, dst 64 bits aligned before, at least 128 bits after.
    vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0, :64]!
    

    memcpy_40bytes_03

  • copy 16 bytes
  • // Copies 16 bytes, destination 128 bits aligned.
    vld1.8      {d0, d1}, [r1]!
    vst1.8      {d0, d1}, [r0, :128]!
    

    memcpy_40bytes_04

  • src remaining is less than 16 bytes, copy 8 bytes
  • vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0]!
    

    memcpy_40bytes_05

  • src remaining is less than 8 bytes, copy 4 bytes
  • 1:      bge         2f
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0]!
    

    memcpy_40bytes_06

    Let (int8_t*)a = &mIn.mData[8] and dst[3:0]=0x4, the content of SIMD registers D0, D1, D2, and D3 should be below after memcpy 40 bytes.

    d0 = {a[36], a[29], a[30], a[31], a[32], a[33], a[34], a[35]}
    d1 = {a[37], a[21], a[22], a[23], a[24], a[25], a[26], a[27]} 
    d2 = {a[38], ……}
    d3 = {a[39], ……}
    

    analysis: possible reason to make memcpy copy only 36 bytes with 4 bytes shift
    One possible reason to make memcpy copy only 36 bytes with 4 bytes shift is that the line 108 of bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S. is skipped by CPU while the other instructions are executed correctly. Let’s check what happen if this line is skipped.

     // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]!
    
  • initial condition
    • R0 is dst of memcpy
      R1 is src of memcpy
      Assume that while this issue happens, dst[3:0] = 4

    memcpy_40bytes_01

  • copy 4 bytes to make r0(dest) 8 byte alignment
  • // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]! -> skipped
    

    memcpy_40bytes_02

  • copy 8 bytes to make r0 16 byte alignment
  • // Copies 8 bytes, dst 64 bits aligned before, at least 128 bits after.
    vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0, :64]!
    

    memcpy_40bytes_03

  • copy 16 bytes
  • // Copies 16 bytes, destination 128 bits aligned.
    vld1.8      {d0, d1}, [r1]!
    vst1.8      {d0, d1}, [r0, :128]!
    

    memcpy_40bytes_04

  • src remaining is less than 16 bytes, copy 8 bytes
  • vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0]!
    

    memcpy_40bytes_05

  • src remaining is less than 8 bytes, copy 4 bytes
  • 1:      bge         2f
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0]!
    

    memcpy_40bytes_06

    If the line 108 is true, let (int8_t*)a = &mIn.mData[8], the content of SIMD registers D0, D1, D2, and D3 after memcpy should be as below.

    d0 = {a[32], a[25], a[26], a[27], a[28], a[29], a[30], a[31]}
    d1 = {a[33], a[17], a[18], a[19], a[20], a[21], a[22], a[23]} 
    d2 = {a[34], ……}
    d3 = {a[35], ……}
    

    analysis: how to prove line 108 of bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S is skipped
    The content of d0, d1, d2, and d3 registers could help prove if line 108 is skipped. However, while the native crash happens, these registers have already been used by others. We couldn’t see the content of d0, d1, d2 and d3 right after memcpy. Thus, I use a debug patch to enforce native crash if memcpy doesn’t work correctly.

    The result coredump and tombstone shows that the value of d0, d1, d2, and d3 are the same as when line 108 is skipped. So I believe that this assumption is true.
    d0 = {a[32], a[25], a[26], a[27], a[28], a[29], a[30], a[31]}
    d1 = {a[33], a[17], a[18], a[19], a[20], a[21], a[22], a[23]}
    d2 = {a[34], ……}
    d3 = {a[35], ……}

    pid: 975, tid: 1315, name: Binder_5  >>> system_server <<<
    thread: Binder_5
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbeef
        r0 00012345  r1 5ed5ed39  r2 00000004  r3 deadbeef
        r4 614e6958  r5 00000028  r6 00000028  r7 0000000a
        r8 5f978cc4  r9 0000000a  sl 401f7206  fp 4015e2f4
        ip 000000c0  sp 5f978c28  lr 401f2a5f  pc 401f2a64  cpsr 800f0030
        d0  0000000000000108  d1  0000272b000031c3
        d2  0061002e00650068  d3  006f007200640059
        d4  006c007500610066  d5  ffffffff00000074
        d6  0000000000000025  d7  ffffffff00000001
        d8  0000000000000000  d9  0000000000000000
        d10 0000000000000000  d11 0000000000000000
        d12 0000000000000000  d13 0000000000000000
        d14 0000000000000000  d15 0000000000000000
        d16 0004000400040004  d17 0004000400040004
        d18 0002000200020002  d19 0002000200020002
        d20 0000000000000000  d21 0000000000000000
        d22 0000000000000000  d23 0000000000000000
        d24 0080008000800080  d25 0080008000800080
        d26 0080008000800080  d27 0080008000800080
        d28 0080008000800080  d29 0080008000800080
        d30 0000000000000000  d31 8080808080808080
        scr 68000011
    
    backtrace:
        #00  pc 00023a64  /system/lib/libbinder.so (android::Parcel::read(void*, unsigned int) const+151)
        #01  pc 000206d9  /system/lib/libbinder.so (android::IPCThreadState::executeCommand(int)+424)
        #02  pc 00020b4f  /system/lib/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+38)
        #03  pc 00020bd7  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+66)
        #04  pc 00025ac1  /system/lib/libbinder.so
        #05  pc 00012671  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
        #06  pc 0004fb4f  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
        #07  pc 000120f3  /system/lib/libutils.so
        #08  pc 0000d360  /system/lib/libc.so (__thread_entry+72)
        #09  pc 0000d4f8  /system/lib/libc.so (pthread_create+240)
    
    (gdb) bt
    #0  0x401f2a64 in android::Parcel::read (this=0x614e6958, outData=0x5f978cc4, len=40) at vendor/htc/native/libs/binder/Parcel.cpp:932
    #1  0x401ef6dc in android::IPCThreadState::executeCommand (this=0x614e6928, cmd=-2144833022) at vendor/htc/native/libs/binder/IPCThreadState.cpp:1059
    #2  0x401efb52 in android::IPCThreadState::getAndExecuteCommand (this=0x614e6928) at vendor/htc/native/libs/binder/IPCThreadState.cpp:448
    #3  0x401efbda in android::IPCThreadState::joinThreadPool (this=0x0, isMain=<optimized out>) at vendor/htc/native/libs/binder/IPCThreadState.cpp:504
    #4  0x5f87b000 in ?? ()
    #5  0x5f87b000 in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb) x/12 mData
    0x5ed5ed30:     0x0000720c      0x80287202      0x54ad22c0      0x547d2cf0
    0x5ed5ed40:     0x0000001d      0x00000010      0x000031b3      0x0000272b
    0x5ed5ed50:     0x00000110      0x00000000      0x5968c308      0x5968c418
    (gdb) x/12 outData
    0x5f978cc4:     0x00000000      0x54ad22c0      0x547d2cf0      0x0000001d
    0x5f978cd4:     0x00000010      0x000031b3      0x0000272b      0x00000110
    0x5f978ce4:     0x00000000      0x5968c308      0x62efbc00      0x614e692
    

    analysis: why an assembly code is skipped by CPU
    At first, I though it was a hardware issue. After co-work with vendor developers, the root cause is that something is wrong in exception handler.

    While an neon instruction is executed in user space, there will be an exception and CPU will enter an exception handler. Usually, an exception handler will deal with exceptions, decrease PC by 4, and return to the context where exception happens. In this case, something is wrong in exception handler, the CPU returns to user space context without decreasing PC by 4. So an neon instruction is skipped.

    Conclusion
    Usually, an exception handler will deal with exceptions, decrease PC by 4, and return to the context where exception happens. If something is wrong in exception handler and CPU returns to the original context without decrease PC by 4, then an instruction would be skipped. In this case, an neon instruction in memcpy is skipped and it makes a process hit native crash in binder transactions.

    android: binder: kernel panic: binder_get_ref

    August 23, 2015

    This post is to discuss debugging kernel panic at binder_get_ref. The reference source code here is qualcomm msm kernel release 3.4.0.

    [  638.949732] c1    936 Unable to handle kernel paging request at virtual address e714c805
    [  638.949732] c1    936 pgd = dd99c000
    [  638.949763] c1    936 [e714c805] *pgd=9f075811, *pte=00000000, *ppte=00000000
    [  638.949763] c1    936 DFSR=00000007, TTBCR=00000000, TTBR0=9e39c06a, TTBR1=8060406a
    [  638.949763] c1    936 MAIR0=ff0881a8, MAIR1=40e040e0, SCTLR=10c5787d
    [  638.949793] c1    936 Internal error: Oops: 7 [#1] PREEMPT SMP ARM
    [  638.949793] c1    936 Modules linked in: bcmdhd dm_crypt(O) moc_crypto(PO) moc_platform_mod(O)
    [  638.949824] c1    936 CPU: 1    Tainted: P        W  O  (3.4.10-gc50a6b2 #1)
    [  638.949854] c1    936 PC is at binder_get_ref+0x8/0x34
    [  638.949854] c1    936 LR is at binder_transaction_buffer_release+0x190/0x340
    [  638.949854] c1    936 pc : [<c050c304>]    lr : [<c050e48c>]    psr: a0000013
    [  638.949854] c1    936 sp : dae7ddd8  ip : 766045f4  fp : 000001cc
    [  638.949885] c1    936 r10: 00000000  r9 : dc3e6400  r8 : 00000068
    [  638.949885] c1    936 r7 : eb000288  r6 : c0cf22f8  r5 : 0001734e  r4 : eb000220
    [  638.949885] c1    936 r3 : e714c7dd  r2 : 000001cb  r1 : 000001cc  r0 : da231580
    …
    [  638.952601] c1    936 Process Binder_4 (pid: 936, stack limit = 0xdae7c2f8)
    …
    [  638.952815] c1    936 [<c050c304>] (binder_get_ref+0x8/0x34) from [<c050e48c>] (binder_transaction_buffer_release+0x190/0x340)
    [  638.952845] c1    936 [<c050e48c>] (binder_transaction_buffer_release+0x190/0x340) from [<c050fb1c>] (binder_thread_write+0x620/0x1f7c)
    [  638.952845] c1    936 [<c050fb1c>] (binder_thread_write+0x620/0x1f7c) from [<c0512238>] (binder_ioctl+0x1d4/0x588)
    [  638.952876] c1    936 [<c0512238>] (binder_ioctl+0x1d4/0x588) from [<c014d2cc>] (do_vfs_ioctl+0x4d8/0x54c)
    [  638.952876] c1    936 [<c014d2cc>] (do_vfs_ioctl+0x4d8/0x54c) from [<c014d374>] (sys_ioctl+0x34/0x54)
    [  638.952906] c1    936 [<c014d374>] (sys_ioctl+0x34/0x54) from [<c000e480>] (ret_fast_syscall+0x0/0x30)
    [  638.952906] c1    936 Code: e1a00003 e12fff1e e5903010 ea000006 (e5932028) 
    [  638.952937] c1    936 ---[ end trace 3d62df4cc805c0c7 ]---
    [  638.952968] c1    936 Kernel panic - not syncing: Binder_4 PC:binder_get_ref+0x8/0x34 LR:binder_transaction_buffer_release+0x190/0x340
    

    analysis
    In kernel binder, a ref of a process points to a node in some process. A ref is a client and a node is a service. Each process use a rb tree to maintains all refs of this process. Each process also use another rb tree to maintains all nodes of this process.

    In this case, thread Binder_4(tid=936), requests a binder transaction from a binder proxy whose description is 0x1cc. Binder driver tries to find ref whose description is 0x1cc from the ref rb tree.

    This panic happens in binder_get_ref(binder_proc=0xdc3e6400, desc=0x1cc). log shows, at timestamp, ref = r3 = 0xe714c7dd and desc = r1 = 0x1cc. Kernel panic happens at ref->desc while deference ref->desc = [0xe714c7dd + 0x28] = 0x[e714c805]

    static struct binder_ref *binder_get_ref(struct binder_proc *proc,
                         uint32_t desc)
    {
        struct rb_node *n = proc->refs_by_desc.rb_node;
        struct binder_ref *ref;
    
        while (n) {
            ref = rb_entry(n, struct binder_ref, rb_node_desc);
    
            if (desc < ref->desc)
                n = n->rb_left;
            else if (desc > ref->desc)
                n = n->rb_right;
            else
                return ref; 
        }
        return NULL;
    }
    
    c050c2fc <binder_get_ref>:
    c050c2fc:   e5903010    ldr r3, [r0, #16]
    c050c300:   ea000006    b   c050c320 <binder_get_ref+0x24>
    c050c304:   e5932028    ldr r2, [r3, #40]   ; 0x28
    c050c308:   e2430004    sub r0, r3, #4
    c050c30c:   e1510002    cmp r1, r2
    c050c310:   35933008    ldrcc   r3, [r3, #8]
    c050c314:   3a000001    bcc c050c320 <binder_get_ref+0x24>
    c050c318:   912fff1e    bxls    lr
    c050c31c:   e5933004    ldr r3, [r3, #4]
    c050c320:   e3530000    cmp r3, #0
    c050c324:   1afffff6    bne c050c304 <binder_get_ref+0x8>
    c050c328:   e1a00003    mov r0, r3
    c050c32c:   e12fff1e    bx  lr
    

    analysis: check if ref rb tree is memory corrupted
    Traverse the ref rb tree to find rb node whose desc is 01xcc. If the root cause is memory corruption, then the traverse would fail in the same place as the kernel panic. However, the traverse is successful and a rb node with desc 0x1cc is found. Now the root cause seems not related to memory corruption. Let’s check the assembly code for further analysis.
    binder_panic_01

    analysis: check registers and assembly code
    Since r2 = 0x1cb, this panic happens while traversing from the rb node with desc 0x1cb to rb node with desc 0x1cd.

    Given, desc = 0x1cc, ref->desc = 0x1cb, and n = 0xda231584, if
    desc > ref->desc is true, then n = n->rb_right. So n = [0xda231584+0x4] = [0xda231588] = 0xda818a84.

    One possible reason to cause this issue is a bit flip in CPU caches happens and [0xda231588] becomes [0xdaa31588] = 0xe714c7dd which is equal to r3.
    0xda231588 = 0b11011010001000110001010110001000
    0xdaa31588 = 0b11011010101000110001010110001000

    binder_panic_02

    c050c304:   e5932028    ldr r2, [r3, #40]   ; 0x28 
    => r2 = [0xda2315ac] = 0x1cb
    c050c308:   e2430004    sub r0, r3, #4
    => r0 = 0xda231580
    => since 0x1cb < 0x1cc, traverse the right child.
    c050c31c:   e5933004    ldr r3, [r3, #4]
    => r3 = [0xda231588] = 0xda818a84 
    => However, a bit flip happens => r3 = [0xdaa31588] = e714c7dd
    c050c320:   e3530000    cmp r3, #0
    c050c324:   1afffff6    bne c050c304 <binder_get_ref+0x8>
    c050c304:   e5932028    ldr r2, [r3, #40]   ; 0x28
    => Unable to handle kernel paging request at virtual address e714c805
    

    conclusion
    In this case, a kernel panic happens at binder_get_ref. At first, traverse the ref rb tree to check if the ref rb tree is memory corrupted. The result shows that the ref rb tree is not memory corrupted. With further analysis of registers and assembly code in binder_get_ref, one possible root cause is that a bit flip happens in CPU caches while memory content is correct. So it’s possible a hardware issue rather than a software issue.


    %d bloggers like this: