Archive for the ‘arm’ Category

android: binder: arm: SIGSEGV at IPCThreadState::executeCommand in libbinder.so

August 30, 2015

This post is to discuss a case in which a segmentation native crash happens in binder transactions. The source code is qualcomm release in which frameworks is android-4.4.3_KTU84L KitKat and kernel is inux 3.10.28. The architecture is Quad-core ARM Cortex A53.

symptom: segmentation at IPCThreadState::executeCommand in libbinder.so

pid: 308, tid: 464, name: Binder_1  >>> /system/bin/surfaceflinger <<<
thread: Binder_1
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
    r0 00000000  r1 b70724f4  r2 00000001  r3 00000003
    r4 b6fad0f0  r5 00000000  r6 b6f4538c  r7 000003e8
    r8 00000134  r9 b64d1000  sl bea4d85c  fp b6f432f4
    ip 000003e8  sp b65cec78  lr b6d6acb3  pc b6d676ec  cpsr 000f0030
    d0  6f207463656a6266  d1  64696f76202c6366
    d2  61503a3a64696f73  d3  65723a3a6c656365
    d4  746e69752074736e  d5  6973202c2a745f38
    d6  6f63202c745f657a  d7  657a69732074736e
    d8  0000000000000000  d9  0000000000000000
    d10 0000000000000000  d11 0000000000000000
    d12 0000000000000000  d13 0000000000000000
    d14 0000000000000000  d15 0000000000000000
    d16 41cdcd6500000000  d17 416fca0560000000
    d18 412e848000000000  d19 0000000000000000
    d20 3ff9d07c84b5dcc6  d21 40cf830000000000
    d22 3efa019fabb79d95  d23 3f2a019f8723aeaa
    d24 3f56c16c16c76a94  d25 3f81111111185da8
    d26 40cf830000000000  d27 3fc55555555554db
    d28 3fe0000000000000  d29 0000000000000001
    d30 fff0000000000000  d31 4000000000000000
    scr 68000010

backtrace:
    #00  pc 000206ec  /system/lib/libbinder.so (android::IPCThreadState::executeCommand(int)+587)
    #01  pc 00020a97  /system/lib/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+38)
    #02  pc 00020b0f  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+50)
    #03  pc 00025935  /system/lib/libbinder.so
    #04  pc 00012671  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
    #05  pc 000120f3  /system/lib/libutils.so
    #06  pc 0000d360  /system/lib/libc.so (__thread_entry+72)
    #07  pc 0000d4f8  /system/lib/libc.so (pthread_create+240)

preliminary: binder thread and IPCThreadState
After opening /dev/binder, a process becomes a binder process and has up to 16 binder threads. Each binder thread has its own thread local IPCThreadState object which could be reference with IPCThreadState::self(). The IPCThreadState:self() provides all the infrastructures for the binder thread to do whatever he or she wants to do.

After spawned, a binder thread calls joinThreadPool to enter loops. transact initiate a transaction in which executeCommand is carried out. The talkWithDriver is the low level functions to ioctl /dev/binder. Finally, shutdown is to exit a binder thread.

class IPCThreadState
{
public:
    static  IPCThreadState*     self();
    static  IPCThreadState*     selfOrNull();  // self(), but won't instantiate
            sp<ProcessState>    process();
            void                joinThreadPool(bool isMain = true);
            status_t            transact(int32_t handle,
                                         uint32_t code, const Parcel& data,
                                         Parcel* reply, uint32_t flags);
    static  void                shutdown();
private:
            status_t            talkWithDriver(bool doReceive=true);
            status_t            executeCommand(int32_t command);
            Parcel              mIn;
            Parcel              mOut;

preliminary: how parcel and binder transaction works
Binder transaction is an IPC transaction. On the client side, one thread of the client process write transaction data into the parcel IPCThreadState()->self()->mOut and requests a binder transaction via ioctl of /dev/binder. This thread will enter kernel space and allocate a binder transaction data whose data is copied from the parcel mOut. Finally, the binder transaction data is en-queued into the todo list of the target process.

On the server side, a binder thread is blocked at ioctl of /dev/binder. If the todo list of the process is en-queued, then the binder thread will enter kernel space and copy the binder transaction data into the IPCThreadState()->self()->mIn.

ipcthreadstate_parcel

(gdb) p mIn      
$33 = {mError = 0, mData = 0xb6fad198 "\fr", mDataSize = 48, mDataCapacity = 256, mDataPos = 48, mObjects = 0x0, mObjectsSize = 0, mObjectsCapacity = 0, mNextObjectHint = 0, mFdsKnown = true, 
  mHasFds = false, mAllowFds = true, mOwner = 0, mOwnerCookie = 0x2e}

analyze: what binder thread is doing while native crash
In this case, the binder thread Binder_1(tid=464) of the process /system/bin/surfaceflinger(pid=308) encounters segmentation while executing the command after reading binder transaction from mIn.

(gdb) bt
#0  android::IPCThreadState::executeCommand (this=0xb6fad0f0, cmd=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:1088
#1  0xb6d67a9a in android::IPCThreadState::getAndExecuteCommand (this=0xb6fad0f0) at frameworks/native/libs/binder/IPCThreadState.cpp:436
#2  0xb6d67b12 in android::IPCThreadState::joinThreadPool (this=0xb6fad0f0, isMain=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:490
#3  0xb6d6c938 in android::PoolThread::threadLoop (this=0xb6fab648) at frameworks/native/libs/binder/ProcessState.cpp:69
#4  0xb6e85672 in android::Thread::_threadLoop (user=0xb6fab648) at frameworks/native/libs/utils/Threads.cpp:812
#5  0xb6e850f4 in thread_data_t::trampoline (t=<optimized out>) at frameworks/native/libs/utils/Threads.cpp:95
#6  0xb6f00364 in __thread_entry (func=0xb6e850bd <thread_data_t::trampoline(thread_data_t const*)>, arg=0xb6fab698, tls=0xb65cedd0) at bionic/libc/bionic/pthread_create.cpp:105
#7  0xb6f004fc in pthread_create (thread_out=0xbea4d85c, attr=<optimized out>, start_routine=0xb6e850bd <thread_data_t::trampoline(thread_data_t const*)>, arg=0x78)
    at bionic/libc/bionic/pthread_create.cpp:224
#8  0x00000000 in ?? ()
status_t IPCThreadState::executeCommand(int32_t cmd)
{
    BBinder* obj;
    RefBase::weakref_type* refs;
    status_t result = NO_ERROR;

    switch (cmd) {
        ......
        ......
    case BR_TRANSACTION:
        {
            binder_transaction_data tr;
            result = mIn.read(&tr, sizeof(tr));
            ALOG_ASSERT(result == NO_ERROR,
                "Not enough command data for brTRANSACTION");
            if (result != NO_ERROR) break;

            Parcel buffer;
            buffer.ipcSetDataReference(
                reinterpret_cast<const uint8_t*>(tr.data.ptr.buffer),
                tr.data_size,
                reinterpret_cast<const size_t*>(tr.data.ptr.offsets),
                tr.offsets_size/sizeof(size_t), freeBuffer, this);

            const pid_t origPid = mCallingPid;
            const uid_t origUid = mCallingUid;

            mCallingPid = tr.sender_pid;
            mCallingUid = tr.sender_euid;
            ......
            if (tr.target.ptr) {
                sp<BBinder> b((BBinder*)tr.cookie);
                const status_t error = b->transact(tr.code, buffer, &reply, tr.flags);
                if (error < NO_ERROR) reply.setError(error);

            } else {
                const status_t error = the_context_object->transact(tr.code, buffer, &reply, tr.flags);
                if (error < NO_ERROR) reply.setError(error);
            }
    }
}
status_t Parcel::read(void* outData, size_t len) const
{
    if ((mDataPos+PAD_SIZE(len)) >= mDataPos && (mDataPos+PAD_SIZE(len)) <= mDataSize
            && len <= PAD_SIZE(len)) {
        memcpy(outData, mData+mDataPos, len);
        mDataPos += PAD_SIZE(len);
        ALOGV("read Setting data pos of %p to %d\n", this, mDataPos);
        return NO_ERROR;
    }    
    return NOT_ENOUGH_DATA;
}

analyze: check if mIn transmitted from binder driver is reasonable
The parcel mIn of the binder thread Binder_1(tid=464) looks reasonable. The binder request is from tid=10071 of pid=1304. This binder transaction is a two-way transaction and allow replies with file descriptors. The binder thread will execute FIRST_CALL_TRANSACTION + 3 functions of the requested service 0xb7036d20.

(gdb) x/12x mIn.mData
0xb6fad198:     0x0000720c      0x80287202      0xb7036d20      0xb70724f4
0xb6fad1a8:     0x00000003      0x00000010      0x00000518      0x00002757
0xb6fad1b8:     0x00000064      0x00000000      0xb65cf028      0xb65cf08c
(gdb) p *((struct binder_transaction_data*)(mIn.mData+0x08))             
$40 = {target = {handle = 3070455072, ptr = 0xb7036d20}, cookie = 0xb70724f4, code = 3, flags = 16, sender_pid = 1304, sender_euid = 10071, data_size = 100, offsets_size = 0, data = {ptr = {
      buffer = 0xb65cf028, offsets = 0xb65cf08c}, buf = "(\360\\\266\214\360\\\266"}}

binder_transaction_data

analyze: check if tr copied from mIn.mData+8 is corrupted
The binder thread Binder_1(tid=464) copies from mIn.mData+8 to local variable tr for 40 bytes. Then it will execute binder transaction and uses tr as the argument. The expected content of is below.
binder_executecommand_tr_01

However, coredump shows that tr content is 4 bytes shift of mIn.mData+8. Now the problem reduce to why memcpy copies only 36 bytes with 4 bytes shift while it is supposed to copy 40 bytes data. Also while this case happens, it always appears that dst[3:0] = 4, such as 0xb65cecf4.
binder_executecommand_tr_02

(gdb) x/12x mIn.mData
0xb6fad198:     0x0000720c      0x80287202      0xb7036d20      0xb70724f4
0xb6fad1a8:     0x00000003      0x00000010      0x00000518      0x00002757
0xb6fad1b8:     0x00000064      0x00000000      0xb65cf028      0xb65cf08c
(gdb) x/12x &tr 
0xb65cecf4:     0x00000000      0xb7036d20      0xb70724f4      0x00000003
0xb65ced04:     0x00000010      0x00000518      0x00002757      0x00000064
0xb65ced14:     0x00000000      0xb65cf028      0xac59912d      0xb6fad0f0

preliminary: how bionic memcpy works in coretex-a53

According to cortex-a53.mk, the implementation of memcpy in cortex-a53 is in bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S.

include bionic/libc/arch-arm/cortex-a15/cortex-a15.mk
  • initial condition
    • R0 is dst of memcpy
      R1 is src of memcpy
      Assume that while this issue happens, dst[3:0] = 4

    memcpy_40bytes_01

  • copy 4 bytes to make r0(dest) 8 byte alignment
  • // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]!
    

    memcpy_40bytes_02

  • copy 8 bytes to make r0 16 byte alignment
  • // Copies 8 bytes, dst 64 bits aligned before, at least 128 bits after.
    vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0, :64]!
    

    memcpy_40bytes_03

  • copy 16 bytes
  • // Copies 16 bytes, destination 128 bits aligned.
    vld1.8      {d0, d1}, [r1]!
    vst1.8      {d0, d1}, [r0, :128]!
    

    memcpy_40bytes_04

  • src remaining is less than 16 bytes, copy 8 bytes
  • vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0]!
    

    memcpy_40bytes_05

  • src remaining is less than 8 bytes, copy 4 bytes
  • 1:      bge         2f
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0]!
    

    memcpy_40bytes_06

    Let (int8_t*)a = &mIn.mData[8] and dst[3:0]=0x4, the content of SIMD registers D0, D1, D2, and D3 should be below after memcpy 40 bytes.

    d0 = {a[36], a[29], a[30], a[31], a[32], a[33], a[34], a[35]}
    d1 = {a[37], a[21], a[22], a[23], a[24], a[25], a[26], a[27]} 
    d2 = {a[38], ……}
    d3 = {a[39], ……}
    

    analysis: possible reason to make memcpy copy only 36 bytes with 4 bytes shift
    One possible reason to make memcpy copy only 36 bytes with 4 bytes shift is that the line 108 of bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S. is skipped by CPU while the other instructions are executed correctly. Let’s check what happen if this line is skipped.

     // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]!
    
  • initial condition
    • R0 is dst of memcpy
      R1 is src of memcpy
      Assume that while this issue happens, dst[3:0] = 4

    memcpy_40bytes_01

  • copy 4 bytes to make r0(dest) 8 byte alignment
  • // Copies 4 bytes, dst 32 bits aligned before, at least 64 bits after.
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0, :32]! -> skipped
    

    memcpy_40bytes_02

  • copy 8 bytes to make r0 16 byte alignment
  • // Copies 8 bytes, dst 64 bits aligned before, at least 128 bits after.
    vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0, :64]!
    

    memcpy_40bytes_03

  • copy 16 bytes
  • // Copies 16 bytes, destination 128 bits aligned.
    vld1.8      {d0, d1}, [r1]!
    vst1.8      {d0, d1}, [r0, :128]!
    

    memcpy_40bytes_04

  • src remaining is less than 16 bytes, copy 8 bytes
  • vld1.8      {d0}, [r1]!
    vst1.8      {d0}, [r0]!
    

    memcpy_40bytes_05

  • src remaining is less than 8 bytes, copy 4 bytes
  • 1:      bge         2f
    vld4.8      {d0[0], d1[0], d2[0], d3[0]}, [r1]!
    vst4.8      {d0[0], d1[0], d2[0], d3[0]}, [r0]!
    

    memcpy_40bytes_06

    If the line 108 is true, let (int8_t*)a = &mIn.mData[8], the content of SIMD registers D0, D1, D2, and D3 after memcpy should be as below.

    d0 = {a[32], a[25], a[26], a[27], a[28], a[29], a[30], a[31]}
    d1 = {a[33], a[17], a[18], a[19], a[20], a[21], a[22], a[23]} 
    d2 = {a[34], ……}
    d3 = {a[35], ……}
    

    analysis: how to prove line 108 of bionic/libc/arch-arm/cortex-a15/bionic/memcpy_base.S is skipped
    The content of d0, d1, d2, and d3 registers could help prove if line 108 is skipped. However, while the native crash happens, these registers have already been used by others. We couldn’t see the content of d0, d1, d2 and d3 right after memcpy. Thus, I use a debug patch to enforce native crash if memcpy doesn’t work correctly.

    The result coredump and tombstone shows that the value of d0, d1, d2, and d3 are the same as when line 108 is skipped. So I believe that this assumption is true.
    d0 = {a[32], a[25], a[26], a[27], a[28], a[29], a[30], a[31]}
    d1 = {a[33], a[17], a[18], a[19], a[20], a[21], a[22], a[23]}
    d2 = {a[34], ……}
    d3 = {a[35], ……}

    pid: 975, tid: 1315, name: Binder_5  >>> system_server <<<
    thread: Binder_5
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbeef
        r0 00012345  r1 5ed5ed39  r2 00000004  r3 deadbeef
        r4 614e6958  r5 00000028  r6 00000028  r7 0000000a
        r8 5f978cc4  r9 0000000a  sl 401f7206  fp 4015e2f4
        ip 000000c0  sp 5f978c28  lr 401f2a5f  pc 401f2a64  cpsr 800f0030
        d0  0000000000000108  d1  0000272b000031c3
        d2  0061002e00650068  d3  006f007200640059
        d4  006c007500610066  d5  ffffffff00000074
        d6  0000000000000025  d7  ffffffff00000001
        d8  0000000000000000  d9  0000000000000000
        d10 0000000000000000  d11 0000000000000000
        d12 0000000000000000  d13 0000000000000000
        d14 0000000000000000  d15 0000000000000000
        d16 0004000400040004  d17 0004000400040004
        d18 0002000200020002  d19 0002000200020002
        d20 0000000000000000  d21 0000000000000000
        d22 0000000000000000  d23 0000000000000000
        d24 0080008000800080  d25 0080008000800080
        d26 0080008000800080  d27 0080008000800080
        d28 0080008000800080  d29 0080008000800080
        d30 0000000000000000  d31 8080808080808080
        scr 68000011
    
    backtrace:
        #00  pc 00023a64  /system/lib/libbinder.so (android::Parcel::read(void*, unsigned int) const+151)
        #01  pc 000206d9  /system/lib/libbinder.so (android::IPCThreadState::executeCommand(int)+424)
        #02  pc 00020b4f  /system/lib/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+38)
        #03  pc 00020bd7  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+66)
        #04  pc 00025ac1  /system/lib/libbinder.so
        #05  pc 00012671  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
        #06  pc 0004fb4f  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
        #07  pc 000120f3  /system/lib/libutils.so
        #08  pc 0000d360  /system/lib/libc.so (__thread_entry+72)
        #09  pc 0000d4f8  /system/lib/libc.so (pthread_create+240)
    
    (gdb) bt
    #0  0x401f2a64 in android::Parcel::read (this=0x614e6958, outData=0x5f978cc4, len=40) at vendor/htc/native/libs/binder/Parcel.cpp:932
    #1  0x401ef6dc in android::IPCThreadState::executeCommand (this=0x614e6928, cmd=-2144833022) at vendor/htc/native/libs/binder/IPCThreadState.cpp:1059
    #2  0x401efb52 in android::IPCThreadState::getAndExecuteCommand (this=0x614e6928) at vendor/htc/native/libs/binder/IPCThreadState.cpp:448
    #3  0x401efbda in android::IPCThreadState::joinThreadPool (this=0x0, isMain=<optimized out>) at vendor/htc/native/libs/binder/IPCThreadState.cpp:504
    #4  0x5f87b000 in ?? ()
    #5  0x5f87b000 in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb) x/12 mData
    0x5ed5ed30:     0x0000720c      0x80287202      0x54ad22c0      0x547d2cf0
    0x5ed5ed40:     0x0000001d      0x00000010      0x000031b3      0x0000272b
    0x5ed5ed50:     0x00000110      0x00000000      0x5968c308      0x5968c418
    (gdb) x/12 outData
    0x5f978cc4:     0x00000000      0x54ad22c0      0x547d2cf0      0x0000001d
    0x5f978cd4:     0x00000010      0x000031b3      0x0000272b      0x00000110
    0x5f978ce4:     0x00000000      0x5968c308      0x62efbc00      0x614e692
    

    analysis: why an assembly code is skipped by CPU
    At first, I though it was a hardware issue. After co-work with vendor developers, the root cause is that something is wrong in exception handler.

    While an neon instruction is executed in user space, there will be an exception and CPU will enter an exception handler. Usually, an exception handler will deal with exceptions, decrease PC by 4, and return to the context where exception happens. In this case, something is wrong in exception handler, the CPU returns to user space context without decreasing PC by 4. So an neon instruction is skipped.

    Conclusion
    Usually, an exception handler will deal with exceptions, decrease PC by 4, and return to the context where exception happens. If something is wrong in exception handler and CPU returns to the original context without decrease PC by 4, then an instruction would be skipped. In this case, an neon instruction in memcpy is skipped and it makes a process hit native crash in binder transactions.

    arm: mm: kernel panic: BUG: Bad page map in process

    August 10, 2015

    This post is to discuss debugging kernel panic: BUG: Bad page map in process. The reference source code here is qualcomm msm kernel release 3.4.0. In the case discussed here, CONFIG_ARM_LPAE is disabled, and virtual address space is 4 GB.

    symptom: BUG: Bad page map in process
    This issue happens while user space requests to unmap an interval of virtual memory. kernel removes the virtual memory interval and walks page table to find the physical page to which the virtual memory is mapping. If the physical page exists, then kernel will clear the page table entry and decrease map count of this mapped page. In this issue, the unmppaed virtual memory maps to a physical free page. It contradicts to unmap a free page and a BUG is triggered.

    [ 518.237101] c1 6508 BUG: Bad page map in process c.sense.browser pte:22880747 pmd:22892831
    [ 518.237115] c1 6508 page:d3847a00 count:0 mapcount:-128 mapping: (null) index:0x0
    [ 518.237122] c1 6508 page flags: 0x14(referenced|dirty)
    [ 518.237138] c1 6508 addr:90130000 vm_flags:000c00fb anon_vma: (null) mapping:efa02118 index:bb
    [ 518.237152] c1 6508 vma->vm_ops->fault: kgsl_gpumem_vm_fault+0x0/0x48
    [ 518.237161] c1 6508 vma->vm_file->f_op->mmap: kgsl_mmap+0x0/0x2b4
    [ 518.237204] c1 6508 kernel BUG at … kernel/mm/memory.c:710!
    [ 518.237212] c1 6508 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
    [ 518.237265] c1 6508 PC is at print_bad_pte+0x154/0x178
    [ 518.237276] c1 6508 LR is at __print_symbol+0x2c/0x4c
    [ 518.239850] c1 6508 Kernel panic - not syncing: c.sense.browser PC:print_bad_pte+0x154/0x178 LR:__print_symbol+0x2c/0x4c
    [ 518.239760] c1 6508 [<c025df78>] (print_bad_pte+0x154/0x178) from [<c025f2d8>] (unmap_single_vma+0x330/0x5a8)
    [ 518.239772] c1 6508 [<c025f2d8>] (unmap_single_vma+0x330/0x5a8) from [<c025fd9c>] (unmap_vmas+0x38/0x54)
    [ 518.239784] c1 6508 [<c025fd9c>] (unmap_vmas+0x38/0x54) from [<c026296c>] (unmap_region+0xa4/0x194)
    [ 518.239795] c1 6508 [<c026296c>] (unmap_region+0xa4/0x194) from [<c026394c>] (do_munmap+0x220/0x288)
    [ 518.239806] c1 6508 [<c026394c>] (do_munmap+0x220/0x288) from [<c02639f0>] (vm_munmap+0x3c/0x50)
    [ 518.239820] c1 6508 [<c02639f0>] (vm_munmap+0x3c/0x50) from [<c01061e0>] (ret_fast_syscall+0x0/0x30)
    [ 518.239829] c1 6508 Code: 0a000002 e5931028 e59f0020 ebfe8752 (e7f001f2)
    

    user space virtual memory management
    Kernel manages the user space virtual address space with a double linked list, current->mm->mmap. While user space tries to unmap a virtual memory interval, kernel removes the interval from the list current->mm->mmap. Kernel also uses a rb tree to manage the same virtual address space to provide optimal time complexity to find the required interval.

    BUG_Bad_page_map_01

    page table
    Since CONFIG_ARM_LPAE is disabled, the page table is 2-level rather than 3-level. In arm v7, within a virtual address, [31..20] bits specify the first level page table offset, [19..12] bits specify the second level page table offset, and [12..0] bits specify the address within the physical page.

    In arm, kernel puts the two consecutive second level page tables together. And it tweaks to use [31..21] bits to specify the first level page table offset, [20..12] bits to specify the second level page table offset.

    BUG_Bad_page_map_02

    debug
    do_munmap might do the following three jobs:

  • Remove vm_area_struct from current->mm->mmap
  • Clear page table entry at current->mm->pg[pgd_offste]->pte[pte_offset]
  • Modify page structure pointed by page table entry
  • In the case, kernel unmaps an interval from 0x90130000 in virtual address space. After walking the page table, kernel concludes that it maps to a physical page whose corresponding page structure is at 0xD3847A00. After decreased by one, page->_mapcount is equal to -129. The page is a Buddy one with order 4. With page->lru, it’s confirmed the page is in a freelist and its predecessor and successor are both order 4 Buddy page, too. So far, it seems that virtual address is not supposed to map to this physical page, let’s walk the page table to see what happens.

    BUG_Bad_page_map_03

    #define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
     
    static inline int PageBuddy(struct page *page)
    {
             return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
    }
     
    inline void __SetPageBuddy(struct page *page)
    {
            VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
            atomic_set(&page->mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
    }
     
    static inline void __ClearPageBuddy(struct page *page)
    {
            VM_BUG_ON_PAGE(!PageBuddy(page), page);
            atomic_set(&page->mapcount, -1);
    

    debug: find the first level page table entry
    The first level page table is task->mm->pgd. Given virtual address is 0x90130000, the first level page table entry is task->mm->pgd[0x480] = 0xE923A400. From ramdump, it appears that in print_bad_pte function the pgd variable have the same value. So it’s probably correct.

    (gdb) p/x 0x90130000 >> 21
    $1 = 0x480
    (gdb) p/x ((pgd_t *)0xE9238000) + 0x480
    $2 = 0xe923a400
    

    BUG_Bad_page_map_04

    debug: find the second level page table
    The first level page table entry points to physical address of the second level page table. The second level page table is 1KB in size and alignment. The physical address of the second level page table is 0x22892800 and 0x31 is flag.

    [0xE923A400] = 0x22892831
    

    As far as second level page table is concerned, arm 32-bit kernel puts two linux page tables and two hardware page tables within a page.

    Within in page 0x22892000:

  • The linux page table 0x22892000 is related to task->mm->pgd[0x480]
  • The linux page table 0x22892400 is related to task->mm->pgd[0x481]
  • The hw page table 0x22892800 is related to task->mm->pgd[0x480]
  • The hw page table 0x22892C00 is related to task->mm->pgd[0x481]
  • task->mm->pgd[0x480] points to the hw page table 0x22892800
  • task->mm->pgd[0x481] points to the hw page table 0x22892C00
  • BUG_Bad_page_map_05

    debug: find the second level page table entry

    (gdb) p/x (0x90130000 >> 12) & ((1<<9)-1)
    $1 = 0x130
    (gdb) p/x ((pte_t *)0x22892000) + 0x130
    $2 = 228924C0
    

    BUG_Bad_page_map_06

    debug: find the physical page
    The content of the second level page table entry was already erased by unmap. From adjacent second level page table entry, the physical address of the page might be 0x22980000. last_kmsg shows that pte is 0x22880000.

    [ 518.237101] c1 6508 BUG: Bad page map in process c.sense.browser pte:22880747 pmd:22892831
    
    mem_map[22880] = 0xD3847A00
    

    BUG_Bad_page_map_07

    conclusion
    The vma should map to the physical page at 0x22980000. But log shows that the physical page is at 0x22880000. It’s possible that a bit-flip happens in cpu cache and the incorrect pte value related to a free page trigger a BUG.


    %d bloggers like this: