Archive for the ‘native crash’ Category

android: arm64: how to analyze the call stack after a process hit native crash

October 28, 2015

This post is to analyze the call stack after a process hits native crash in android.

testing environment
The infrastructure code base here is LA.BF64.1.1-06510-8×94.0 with Android 5.0.0_r2(LRX21M) and Linux kernel 3.10.49. The device CPU is architecture arm-v8 cortex-53.

use gdb to get call stack from core file
In android: coredump: analyze core file with gdb, we demonstrate how to use gdb to load core file and get call stack of coredumptest, which deferences a NULL pointer and hit native crash.

(gdb) bt
#0  strlen () at bionic/libc/arch-arm64/generic/bionic/strlen.S:71
#1  0x0000005595326f00 in strlen (s=0x0) at bionic/libc/include/string.h:239
#2  test4 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:11
#3  0x0000005595326f88 in test3 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:20
#4  0x0000005595327010 in test2 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:29
#5  0x0000005595327098 in test1 () at frameworks/native/services/coredumptest/CoredumpTest.cpp:38
#6  0x0000005595326d7c in main () at frameworks/native/services/coredumptest/CoredumpTest.cpp:56

get call stack from tombstone
In addition to core file, we could also get call stacks from tombstone. While a 64-bit process hits native crash, debuggerd64 wlll attach the process and dump its register and call stacks in /data/tombstones/tombstone_0x, where 0 <= x <= 9.

ABI: 'arm64'
pid: 20948, tid: 20948, name: coredumptest  >>> /data/coredumptest <<< 
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 
    x0   0000000000000000  x1   0000000000000000  x2   0000007fcbe33358  x3   000000000000000a
    x4   0000000000000001  x5   0000000000000000  x6   000000000000000b  x7   0000000000000000
    x8   00000000000000a4  x9   0000000000000000  x10  0000007fcbe32f88  x11  0101010101010101
    x12  0000000000000001  x13  000000000000001e  x14  0000007faa6560f0  x15  0000007faa656100
    x16  0000005595338fb8  x17  0000007faa597424  x18  0000000000000000  x19  ffffffffffffffff
    x20  0000007fcbe33348  x21  0000000000000001  x22  0000005595326d50  x23  0000000000000000
    x24  0000000000000000  x25  0000000000000000  x26  0000000000000000  x27  0000000000000000
    x28  0000000000000000  x29  0000007fcbe33220  x30  0000005595326f00
    sp   0000007fcbe33220  pc   0000007faa597434  pstate 0000000040000000
    v0   2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e  v1   746165662e6d70642e74736973726570
    v2   6f63656e6362696c0000000000657275  v3   00000000000000000000000000000000
    v4   00000000000000008020080280000000  v5   00000000400000004000000000000000
    v6   00000000000000000000000000000000  v7   80200802802008028020080280200802
    v8   00000000000000000000000000000000  v9   00000000000000000000000000000000
    v10  00000000000000000000000000000000  v11  00000000000000000000000000000000
    v12  00000000000000000000000000000000  v13  00000000000000000000000000000000
    v14  00000000000000000000000000000000  v15  00000000000000000000000000000000
    v16  40100401401004014010040140100401  v17  00000000a00a80000000aa8000404000
    v18  00000000000000008020080280000000  v19  00000000000000000000000000000000
    v20  00000000000000000000000000000000  v21  00000000000000000000000000000000
    v22  00000000000000000000000000000000  v23  00000000000000000000000000000000
    v24  00000000000000000000000000000000  v25  00000000000000000000000000000000
    v26  00000000000000000000000000000000  v27  00000000000000000000000000000000
    v28  00000000000000000000000000000000  v29  00000000000000000000000000000000
    v30  00000000000000000000000000000000  v31  00000000000000000000000000000000
    fpsr 00000000  fpcr 00000000

backtrace:
    #00 pc 0000000000014434  /system/lib64/libc.so (strlen+16)
    #01 pc 0000000000000efc  /data/coredumptest
    #02 pc 0000000000000f84  /data/coredumptest
    #03 pc 000000000000100c  /data/coredumptest
    #04 pc 0000000000001094  /data/coredumptest
    #05 pc 0000000000000d78  /data/coredumptest (main+40)
    #06 pc 0000000000013474  /system/lib64/libc.so (__libc_init+100)
    #07 pc 0000000000000e8c  /data/coredumptest

use addr2line to analyze call stacks in tombstone
We could use addr2line to transform symbol address to source code function name and line number.

$ aarch64-linux-android-addr2line -e symbols/system/bin/coredumptest -a 0000000000000ef8
_Z5test4v
0x0000000000000ef8
frameworks/native/services/coredumptest/CoredumpTest.cpp:10
$ aarch64-linux-android-addr2line -e symbols/system/bin/coredumptest -a 0000000000000f84
0x0000000000000f84
_Z5test3v
frameworks/native/services/coredumptest/CoredumpTest.cpp:20
$ aarch64-linux-android-addr2line -e symbols/system/bin/coredumptest -a 000000000000100c
0x000000000000100c
_Z5test2v
frameworks/native/services/coredumptest/CoredumpTest.cpp:29
$ aarch64-linux-android-addr2line -f -e symbols/system/bin/coredumptest -a 0000000000001094
0x0000000000001094
_Z5test1v
frameworks/native/services/coredumptest/CoredumpTest.cpp:38
$ aarch64-linux-android-addr2line -f -e symbols/system/bin/coredumptest -a 0000000000000d78       
0x0000000000000d78
main
frameworks/native/services/coredumptest/CoredumpTest.cpp:56

review source code to see why the native crash happens
From source code, we could find that the native crash is due to dereferencing NULL pointer.

#define LOG_TAG "CoredumpTest"

#include <utils/Log.h>
#include <string.h>
#include <sys/resource.h>

using namespace android;

int test4()
{
    int ret = strlen(NULL);

    ALOGD("enter %s: %d", __func__,  ret);

    return ret;
}

int test3()
{
    int ret = test4() + 3;

    ALOGD("enter %s: %d", __func__, ret);

    return ret;
}

int test2()
{
    int ret = test3() + 2;

    ALOGD("enter %s: %d", __func__, ret);

    return ret;
}

int test1()
{
    int ret = test2() + 1;

    ALOGD("enter %s: %d", __func__, ret);

    return ret;
}

int main()
{
    struct rlimit core_limit;
    core_limit.rlim_cur = RLIM_INFINITY;
    core_limit.rlim_max = RLIM_INFINITY;

    if (setrlimit(RLIMIT_CORE, &core_limit) < 0) {
        ALOGD("Failed to setrlimit: %s", strerror(errno));
        return 1;
    }

    int n = test1();
    ALOGD("Ready to enter test");

    return 0;
}

conclusion
After a process hit native crash, we could analyze the call stack of the process from core file or tombstone. Then, review the source code to see why the crash happens.

android: binder: binder thread encoutners native crash due to bad commands from binder driver

September 4, 2015

This post is to discuss a case in which a binder thread encoutners native crash due to bad commands from binder driver. The reference code is qualcomm msm8974 release in which frameworks is KitKat 4.4.2 and the kernel is Linux 3.4.0.

Symptom: IPCThreadState: *** BAD COMMAND 29190 received from Binder driver, and then the binder thread calls abort
For this case, I’ll introduce many binder transaction examples, and finally demonstrate in which condition does this case happen.

06-14 21:46:25.958  1067  1526 E IPCThreadState: *** BAD COMMAND 29190 received from Binder driver
06-14 21:46:25.958  1067  1526 E IPCThreadState: getAndExecuteCommand(fd=12) returned unexpected error -2147483648, aborting
06-14 21:46:25.958  1067  1526 F libc    : Fatal signal 6 (SIGABRT) at 0x0000042b (code=-6), thread 1526 (Binder_B)
pid: 1067, tid: 1526, name: Binder_B  >>> system_server <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
    r0 00000000  r1 000005f6  r2 00000006  r3 00000000
    r4 00000006  r5 0000000c  r6 000005f6  r7 0000010c
    r8 4019b081  r9 81c4a000  sl 81c49bac  fp 4013f2f4
    ip 00000000  sp 81d47cc0  lr 40103c79  pc 40112c70  cpsr 000f0010
    d0  0000000000000000  d1  0000000000000000
    d2  0000000000000000  d3  0000000000000000
    d4  3a30333030303030  d5  3730303936303020
    d6  3030653630302034  d7  3038363030206636
    d8  0000000000000000  d9  0000000000000000
    d10 0000000000000000  d11 0000000000000000
    d12 0000000000000000  d13 0000000000000000
    d14 0000000000000000  d15 0000000000000000
    d16 6e72757465722029  d17 7078656e75206465
    d18 520a30203a657a69  d19 2064657669656365
    d20 73646e616d6d6f63  d21 7264206d6f726620
    d22 3030203a72657669  d23 3020633032373030
    d24 3f56c16c16c76a94  d25 3f81111111185da8
    d26 3fa555555555551c  d27 3fc55555555554db
    d28 3fe0000000000000  d29 0000000000000001
    d30 fff0000000000000  d31 4000000000000000
    scr 68000011

backtrace:
    #00  pc 00022c70  /system/lib/libc.so (tgkill+12)
    #01  pc 00013c75  /system/lib/libc.so (pthread_kill+48)
    #02  pc 00013e89  /system/lib/libc.so (raise+10)
    #03  pc 00012a23  /system/lib/libc.so
    #04  pc 00022524  /system/lib/libc.so (abort+4)
    #05  pc 00021809  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+124)
    #06  pc 0002658d  /system/lib/libbinder.so
    #07  pc 00012635  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
    #08  pc 0004fdf7  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
    #09  pc 000120b7  /system/lib/libutils.so
    #10  pc 0000d2c0  /system/lib/libc.so (__thread_entry+72)
    #11  pc 0000d458  /system/lib/libc.so (pthread_create+240)

preliminary: example#1: an asynchronous binder transaction

  1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).

transaction_01

preliminary: example#2: a synchronous binder transaction

  1. The client thread(3684:3684) requests a synchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).

transaction_02

preliminary: example#3: an asynchronous binder transaction which triggers another synchronous binder transaction

  1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).
  2. The client thread(1067:1526) requests a synchronous binder transaction to a binder service provided by the process(387). This transaction is served by the binder thread(387:1008).

transaction_03

preliminary: example#4: two asynchronous binder transactions to the same service, each further triggering another binder transaction

  1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the thread(1067:1526).
  2. The client thread(1067:1526) requests a synchronous binder transaction to a binder service provided by the process(387). This transaction is served by the binder thread(387:1008).
  3. The client thread(1726:1726) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the thread(1067:1526).
  4. The client thread(1067:1526) requests an asynchronous binder transaction to a binder service provided by the process(4267). This transaction is served by the binder thread(4267:5488).

transaction_04

Why the asynchronous transaction requested by the client thread(1726:1726) is blocked until the asynchronous transaction requested by the client thread(3684:3684) is completed by the binder thread(1067:1526)?

  • Both the client thread(3684:3684) and the client thread(1726:1726) request an asynchronous transaction to the same binder service provided by the process(1067).
  • The asynchronous transaction requested by the client thread(1726:1726) is en-queued into the binder service’s todo list since another asynchronous transaction to the same binder service requested by the client thread(3684:3684) is being served.
  • The en-queued asynchronous transaction is moved into the binder thread(1067:1526)’s todolist after the binder thread(1067:1526) completes the asynchronous transaction requested by client thread(3684:3684).
  • The binder thread(1067:1526) de-queues and executes the asynchronous binder transaction requested by the client thread(1726:1726).

analysis: what could make this symptom happen
If a BC_FREE_BUFFER command is sent as below diagram, then the following situation would happen:

  1. The binder thread(1726:1726) receives two BR_TRANSACTION commands consecutively.
  2. The binder thread(1726:1726) requests two BC_TRANSACTION commands consecutively.
  3. The binder thread(1726:1726) receives two BC_TRANSACTION_COMPLETE consecutively.

This symptom happens while the binder threads receives two BR_TRANSACTION_COMPLETE commands consecutively. So the problem is reduced to why the binder thread(1726:1726) requests BC_FREE_BUFFER unexpectedly.
transaction_05

analysis: why the binder thread(1726:1726) requests BC_FREE_BUFFER unexpectedly
The BC_FREE_BUFFER commands is executed in processPendingDerefs() in which some binder reference is clear and a destructor frees a binder buffer. This bug doesn’t appear in AOSP and pure vendor release.

void IPCThreadState::joinThreadPool(bool isMain)
{
    LOG_THREADPOOL("**** THREAD %p (PID %d) IS JOINING THE THREAD POOL\n", (void*)pthread_self(), getpid());
    mOut.writeInt32(isMain ? BC_ENTER_LOOPER : BC_REGISTER_LOOPER);
    
    // This thread may have been spawned by a thread that was in the background
    // scheduling group, so first we will make sure it is in the foreground
    // one to avoid performing an initial transaction in the background.
    set_sched_policy(mMyThreadId, SP_FOREGROUND);
        
    status_t result;
    do {
        processPendingDerefs();
        // now get the next command to be processed, waiting if necessary
        result = getAndExecuteCommand();
        if (result < NO_ERROR && result != TIMED_OUT && result != -ECONNREFUSED && result != -EBADF) {
            ALOGE("getAndExecuteCommand(fd=%d) returned unexpected error %d, aborting",
                  mProcess->mDriverFD, result);
            abort();
        }
        
        // Let this thread exit the thread pool if it is no longer
        // needed and it is not the main process thread.
        if(result == TIMED_OUT && !isMain) {
            break;
        }
    } while (result != -ECONNREFUSED && result != -EBADF);

conclusion
In IPCThreadState::joinThreadPool(), the message loop of a binder thread, if processPendiDerefs() calls a destructor which calls freeBuffer, then it might send a unexpected BC_FREE_BUFFER command. The unexpected BC_FREE_BUFFER causes two consecutive BR_TRANSACTIONS, two consecutive BC_TRANSACTIONS, and two consecutive BR_TRANSACTION_COMPLETE. Finally, the binder thread will detect a BAD COMMAND and call abort to exit the process.

android: binder: arm: SIGSEGV at IPCThreadState::executeCommand in libbinder.so

August 30, 2015

This post is to discuss a case in which a segmentation native crash happens in binder transactions. The source code is qualcomm release in which frameworks is android-4.4.3_KTU84L KitKat and kernel is inux 3.10.28. The architecture is Quad-core ARM Cortex A53.

symptom: segmentation at IPCThreadState::executeCommand in libbinder.so

pid: 308, tid: 464, name: Binder_1  >>> /system/bin/surfaceflinger <<<
thread: Binder_1
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
    r0 00000000  r1 b70724f4  r2 00000001  r3 00000003
    r4 b6fad0f0  r5 00000000  r6 b6f4538c  r7 000003e8
    r8 00000134  r9 b64d1000  sl bea4d85c  fp b6f432f4
    ip 000003e8  sp b65cec78  lr b6d6acb3  pc b6d676ec  cpsr 000f0030
    d0  6f207463656a6266  d1  64696f76202c6366
    d2  61503a3a64696f73  d3  65723a3a6c656365
    d4  746e69752074736e  d5  6973202c2a745f38
    d6  6f63202c745f657a  d7  657a69732074736e
    d8  0000000000000000  d9  0000000000000000
    d10 0000000000000000  d11 0000000000000000
    d12 0000000000000000  d13 0000000000000000
    d14 0000000000000000  d15 0000000000000000
    d16 41cdcd6500000000  d17 416fca0560000000
    d18 412e848000000000  d19 0000000000000000
    d20 3ff9d07c84b5dcc6  d21 40cf830000000000
    d22 3efa019fabb79d95  d23 3f2a019f8723aeaa
    d24 3f56c16c16c76a94  d25 3f81111111185da8
    d26 40cf830000000000  d27 3fc55555555554db
    d28 3fe0000000000000  d29 0000000000000001
    d30 fff0000000000000  d31 4000000000000000
    scr 68000010

backtrace:
    #00  pc 000206ec  /system/lib/libbinder.so (android::IPCThreadState::executeCommand(int)+587)
    #01  pc 00020a97  /system/lib/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+38)
    #02  pc 00020b0f  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+50)
    #03  pc 00025935  /system/lib/libbinder.so
    #04  pc 00012671  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
    #05  pc 000120f3  /system/lib/libutils.so
    #06  pc 0000d360  /system/lib/libc.so (__thread_entry+72)
    #07  pc 0000d4f8  /system/lib/libc.so (pthread_create+240)

preliminary: binder thread and IPCThreadState
After opening /dev/binder, a process becomes a binder process and has up to 16 binder threads. Each binder thread has its own thread local IPCThreadState object which could be reference with IPCThreadState::self(). The IPCThreadState:self() provides all the infrastructures for the binder thread to do whatever he or she wants to do.

After spawned, a binder thread calls joinThreadPool to enter loops. transact initiate a transaction in which executeCommand is carried out. The talkWithDriver is the low level functions to ioctl /dev/binder. Finally, shutdown is to exit a binder thread.

class IPCThreadState
{
public:
    static  IPCThreadState*     self();
    static  IPCThreadState*     selfOrNull();  // self(), but won't instantiate
            sp<ProcessState>    process();
            void                joinThreadPool(bool isMain = true);
            status_t            transact(int32_t handle,
                                         uint32_t code, const Parcel& data,
                                         Parcel* reply, uint32_t flags);
    static  void                shutdown();
private:
            status_t            talkWithDriver(bool doReceive=true);
            status_t            executeCommand(int32_t command);
            Parcel              mIn;
            Parcel              mOut;

preliminary: how parcel and binder transaction works
Binder transaction is an IPC transaction. On the client side, one thread of the client process write transaction data into the parcel IPCThreadState()->self()->mOut and requests a binder transaction via ioctl of /dev/binder. This thread will enter kernel space and allocate a binder transaction data whose data is copied from the parcel mOut. Finally, the binder transaction data is en-queued into the todo list of the target process.

On the server side, a binder thread is blocked at ioctl of /dev/binder. If the todo list of the process is en-queued, then the binder thread will enter kernel space and copy the binder transaction data into the IPCThreadState()->self()->mIn.

ipcthreadstate_parcel

(gdb) p mIn      
$33 = {mError = 0, mData = 0xb6fad198 "\fr", mDataSize = 48, mDataCapacity = 256, mDataPos = 48, mObjects = 0x0, mObjectsSize = 0, mObjectsCapacity = 0, mNextObjectHint = 0, mFdsKnown = true, 
  mHasFds = false, mAllowFds = true, mOwner = 0, mOwnerCookie = 0x2e}

analyze: what binder thread is doing while native crash
In this case, the binder thread Binder_1(tid=464) of the process /system/bin/surfaceflinger(pid=308) encounters segmentation while executing the command after reading binder transaction from mIn.

(gdb) bt
#0  android::IPCThreadState::executeCommand (this=0xb6fad0f0, cmd=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:1088
#1  0xb6d67a9a in android::IPCThreadState::getAndExecuteCommand (this=0xb6fad0f0) at frameworks/native/libs/binder/IPCThreadState.cpp:436
#2  0xb6d67b12 in android::IPCThreadState::joinThreadPool (this=0xb6fad0f0, isMain=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:490
#3  0xb6d6c938 in android::PoolThread::threadLoop (this=0xb6fab648) at frameworks/native/libs/binder/ProcessState.cpp:69
#4  0xb6e85672 in android::Thread::_threadLoop (user=0xb6fab648) at frameworks/native/libs/utils/Threads.cpp:812
#5  0xb6e850f4 in thread_data_t::trampoline (t=<optimized out>) at frameworks/native/libs/utils/Threads.cpp:95
#6  0xb6f00364 in __thread_entry (func=0xb6e850bd <thread_data_t::trampoline(thread_data_t const*)>, arg=0xb6fab698, tls=0xb65cedd0) at bionic/libc/bionic/pthread_create.cpp:105
#7  0xb6f004fc in pthread_create (thread_out=0xbea4d85c, attr=<optimized out>, start_routine=0xb6e850bd <thread_data_t::trampoline(thread_data_t const*)>, arg=0x78)
    at bionic/libc/bionic/pthread_create.cpp:224
#8  0x00000000 in ?? ()
status_t IPCThreadState::executeCommand(int32_t cmd)
{
    BBinder* obj;
    RefBase::weakref_type* refs;
    status_t result = NO_ERROR;

    switch (cmd) {
        ......
        ......
    case BR_TRANSACTION:
        {
            binder_transaction_data tr;
            result = mIn.read(&tr, sizeof(tr));
            ALOG_ASSERT(result == NO_ERROR,
                "Not enough command data for brTRANSACTION");
            if (result != NO_ERROR) break;

            Parcel buffer;
            buffer.ipcSetDataReference(
                reinterpret_cast<const uint8_t*>(tr.data.ptr.buffer),
                tr.data_size,
                reinterpret_cast<const size_t*>(tr.data.ptr.offsets),
                tr.offsets_size/sizeof(size_t), freeBuffer, this);

            const pid_t origPid = mCallingPid;
            const uid_t origUid = mCallingUid;

            mCallingPid = tr.sender_pid;
            mCallingUid = tr.sender_euid;
            ......
            if (tr.target.ptr) {
                sp<BBinder> b((BBinder*)tr.cookie);
                const status_t error = b->transact(tr.code, buffer, &reply, tr.flags);
                if (error < NO_ERROR) reply.setError(error);

            } else {
                const status_t error = the_context_object->transact(tr.code, buffer, &reply, tr.flags);
                if (error < NO_ERROR) reply.setError(error);
            }
    }
}
status_t Parcel::read(void* outData, size_t len) const
{
    if ((mDataPos+PAD_SIZE(len)) >= mDataPos && (mDataPos+PAD_SIZE(len)) <= mDataSize
            && len <= PAD_SIZE(len)) {
        memcpy(outData, mData+mDataPos, len);
        mDataPos += PAD_SIZE(len);
        ALOGV("read Setting data pos of %p to %d\n", this, mDataPos);
        return NO_ERROR;
    }    
    return NOT_ENOUGH_DATA;
}

analyze: check if mIn transmitted from binder driver is reasonable
The parcel mIn of the binder thread Binder_1(tid=464) looks reasonable. The binder request is from tid=10071 of pid=1304. This binder transaction is a two-way transaction and allow replies with file descriptors. The binder thread will execute FIRST_CALL_TRANSACTION + 3 functions of the requested service 0xb7036d20.

(gdb) x/12x mIn.mData
0xb6fad198:     0x0000720c      0x80287202      0xb7036d20      0xb70724f4
0xb6fad1a8:     0x00000003      0x00000010      0x00000518      0x00002757
0xb6fad1b8:     0x00000064      0x00000000      0xb65cf028      0xb65cf08c
(gdb) p *((struct binder_transaction_data*)(mIn.mData+0x08))             
$40 = {target = {handle = 3070455072, ptr = 0xb7036d20}, cookie = 0xb70724f4, code = 3, flags = 16, sender_pid = 1304, sender_euid = 10071, data_size = 100, offsets_size = 0, data = {ptr = {
      buffer = 0xb65cf028, offsets = 0xb65cf08c}, buf = "(\360\\\266\214\360\\\266"}}

binder_transaction_data

analyze: check if tr copied from mIn.mData+8 is corrupted
The binder thread Binder_1(tid=464) copies from mIn.mData+8 to local variable tr for 40 bytes. Then it will execute binder transaction and uses tr as the argument. The expected content of is below.
binder_executecommand_tr_01

However, coredump shows that tr content is 4 bytes shift of mIn.mData+8. Now the problem reduce to why memcpy copies only 36 bytes with 4 bytes shift while it is supposed to copy 40 bytes data. Also while this case happens, it always appears that dst[3:0] = 4, such as 0xb65cecf4.
binder_executecommand_tr_02

(gdb) x/12x mIn.mData
0xb6fad198:     0x0000720c      0x80287202      0xb7036d20      0xb70724f4
0xb6fad1a8:     0x00000003      0x00000010      0x00000518      0x00002757
0xb6fad1b8:     0x00000064      0x00000000      0xb65cf028      0xb65cf08c
(gdb) x/12x &tr 
0xb65cecf4:     0x00000000      0xb7036d20      0xb70724f4      0x00000003
0xb65ced04:     0x00000010      0x00000518      0x00002757      0x00000064
0xb65ced14:     0x00000000      0xb65cf028      0xac59912d      0xb6fad0f0

preliminary: how bionic memcpy works in coretex-a53

According to cortex-a53.mk, the implementation of memcpy in cortex-a53 is in bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S.

include bionic/libc/arch-arm/cortex-a15/cortex-a15.mk
  • initial condition
    • R0 is dst of memcpy
      R1 is src of memcpy
      Assume that while this issue happens, dst[3:0] = 4

    memcpy_40bytes_01

  • copy 4 bytes to make r0(dest) 8 byte alignment
  • // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]!
    

    memcpy_40bytes_02

  • copy 8 bytes to make r0 16 byte alignment
  • // Copies 8 bytes, dst 64 bits aligned before, at least 128 bits after.
    vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0, :64]!
    

    memcpy_40bytes_03

  • copy 16 bytes
  • // Copies 16 bytes, destination 128 bits aligned.
    vld1.8      {d0, d1}, [r1]!
    vst1.8      {d0, d1}, [r0, :128]!
    

    memcpy_40bytes_04

  • src remaining is less than 16 bytes, copy 8 bytes
  • vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0]!
    

    memcpy_40bytes_05

  • src remaining is less than 8 bytes, copy 4 bytes
  • 1:      bge         2f
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0]!
    

    memcpy_40bytes_06

    Let (int8_t*)a = &mIn.mData[8] and dst[3:0]=0x4, the content of SIMD registers D0, D1, D2, and D3 should be below after memcpy 40 bytes.

    d0 = {a[36], a[29], a[30], a[31], a[32], a[33], a[34], a[35]}
    d1 = {a[37], a[21], a[22], a[23], a[24], a[25], a[26], a[27]} 
    d2 = {a[38], ……}
    d3 = {a[39], ……}
    

    analysis: possible reason to make memcpy copy only 36 bytes with 4 bytes shift
    One possible reason to make memcpy copy only 36 bytes with 4 bytes shift is that the line 108 of bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S. is skipped by CPU while the other instructions are executed correctly. Let’s check what happen if this line is skipped.

     // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]!
    
  • initial condition
    • R0 is dst of memcpy
      R1 is src of memcpy
      Assume that while this issue happens, dst[3:0] = 4

    memcpy_40bytes_01

  • copy 4 bytes to make r0(dest) 8 byte alignment
  • // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]! -> skipped
    

    memcpy_40bytes_02

  • copy 8 bytes to make r0 16 byte alignment
  • // Copies 8 bytes, dst 64 bits aligned before, at least 128 bits after.
    vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0, :64]!
    

    memcpy_40bytes_03

  • copy 16 bytes
  • // Copies 16 bytes, destination 128 bits aligned.
    vld1.8      {d0, d1}, [r1]!
    vst1.8      {d0, d1}, [r0, :128]!
    

    memcpy_40bytes_04

  • src remaining is less than 16 bytes, copy 8 bytes
  • vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0]!
    

    memcpy_40bytes_05

  • src remaining is less than 8 bytes, copy 4 bytes
  • 1:      bge         2f
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0]!
    

    memcpy_40bytes_06

    If the line 108 is true, let (int8_t*)a = &mIn.mData[8], the content of SIMD registers D0, D1, D2, and D3 after memcpy should be as below.

    d0 = {a[32], a[25], a[26], a[27], a[28], a[29], a[30], a[31]}
    d1 = {a[33], a[17], a[18], a[19], a[20], a[21], a[22], a[23]} 
    d2 = {a[34], ……}
    d3 = {a[35], ……}
    

    analysis: how to prove line 108 of bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S is skipped
    The content of d0, d1, d2, and d3 registers could help prove if line 108 is skipped. However, while the native crash happens, these registers have already been used by others. We couldn’t see the content of d0, d1, d2 and d3 right after memcpy. Thus, I use a debug patch to enforce native crash if memcpy doesn’t work correctly.

    The result coredump and tombstone shows that the value of d0, d1, d2, and d3 are the same as when line 108 is skipped. So I believe that this assumption is true.
    d0 = {a[32], a[25], a[26], a[27], a[28], a[29], a[30], a[31]}
    d1 = {a[33], a[17], a[18], a[19], a[20], a[21], a[22], a[23]}
    d2 = {a[34], ……}
    d3 = {a[35], ……}

    pid: 975, tid: 1315, name: Binder_5  >>> system_server <<<
    thread: Binder_5
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbeef
        r0 00012345  r1 5ed5ed39  r2 00000004  r3 deadbeef
        r4 614e6958  r5 00000028  r6 00000028  r7 0000000a
        r8 5f978cc4  r9 0000000a  sl 401f7206  fp 4015e2f4
        ip 000000c0  sp 5f978c28  lr 401f2a5f  pc 401f2a64  cpsr 800f0030
        d0  0000000000000108  d1  0000272b000031c3
        d2  0061002e00650068  d3  006f007200640059
        d4  006c007500610066  d5  ffffffff00000074
        d6  0000000000000025  d7  ffffffff00000001
        d8  0000000000000000  d9  0000000000000000
        d10 0000000000000000  d11 0000000000000000
        d12 0000000000000000  d13 0000000000000000
        d14 0000000000000000  d15 0000000000000000
        d16 0004000400040004  d17 0004000400040004
        d18 0002000200020002  d19 0002000200020002
        d20 0000000000000000  d21 0000000000000000
        d22 0000000000000000  d23 0000000000000000
        d24 0080008000800080  d25 0080008000800080
        d26 0080008000800080  d27 0080008000800080
        d28 0080008000800080  d29 0080008000800080
        d30 0000000000000000  d31 8080808080808080
        scr 68000011
    
    backtrace:
        #00  pc 00023a64  /system/lib/libbinder.so (android::Parcel::read(void*, unsigned int) const+151)
        #01  pc 000206d9  /system/lib/libbinder.so (android::IPCThreadState::executeCommand(int)+424)
        #02  pc 00020b4f  /system/lib/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+38)
        #03  pc 00020bd7  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+66)
        #04  pc 00025ac1  /system/lib/libbinder.so
        #05  pc 00012671  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
        #06  pc 0004fb4f  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
        #07  pc 000120f3  /system/lib/libutils.so
        #08  pc 0000d360  /system/lib/libc.so (__thread_entry+72)
        #09  pc 0000d4f8  /system/lib/libc.so (pthread_create+240)
    
    (gdb) bt
    #0  0x401f2a64 in android::Parcel::read (this=0x614e6958, outData=0x5f978cc4, len=40) at vendor/htc/native/libs/binder/Parcel.cpp:932
    #1  0x401ef6dc in android::IPCThreadState::executeCommand (this=0x614e6928, cmd=-2144833022) at vendor/htc/native/libs/binder/IPCThreadState.cpp:1059
    #2  0x401efb52 in android::IPCThreadState::getAndExecuteCommand (this=0x614e6928) at vendor/htc/native/libs/binder/IPCThreadState.cpp:448
    #3  0x401efbda in android::IPCThreadState::joinThreadPool (this=0x0, isMain=<optimized out>) at vendor/htc/native/libs/binder/IPCThreadState.cpp:504
    #4  0x5f87b000 in ?? ()
    #5  0x5f87b000 in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb) x/12 mData
    0x5ed5ed30:     0x0000720c      0x80287202      0x54ad22c0      0x547d2cf0
    0x5ed5ed40:     0x0000001d      0x00000010      0x000031b3      0x0000272b
    0x5ed5ed50:     0x00000110      0x00000000      0x5968c308      0x5968c418
    (gdb) x/12 outData
    0x5f978cc4:     0x00000000      0x54ad22c0      0x547d2cf0      0x0000001d
    0x5f978cd4:     0x00000010      0x000031b3      0x0000272b      0x00000110
    0x5f978ce4:     0x00000000      0x5968c308      0x62efbc00      0x614e692
    

    analysis: why an assembly code is skipped by CPU
    At first, I though it was a hardware issue. After co-work with vendor developers, the root cause is that something is wrong in exception handler.

    While an neon instruction is executed in user space, there will be an exception and CPU will enter an exception handler. Usually, an exception handler will deal with exceptions, decrease PC by 4, and return to the context where exception happens. In this case, something is wrong in exception handler, the CPU returns to user space context without decreasing PC by 4. So an neon instruction is skipped.

    Conclusion
    Usually, an exception handler will deal with exceptions, decrease PC by 4, and return to the context where exception happens. If something is wrong in exception handler and CPU returns to the original context without decrease PC by 4, then an instruction would be skipped. In this case, an neon instruction in memcpy is skipped and it makes a process hit native crash in binder transactions.


    %d bloggers like this: