Archive for September, 2015

android: init: console service

September 23, 2015

In android: what is console , we discuss what is console. In this post, we discusses what console service is. The reference source code here is CodeAuora LA.BR.1.3.1-09730-8952.0.

$ repo init -u git://codeaurora.org/quic/la/platform/manifest -b refs/tags/LA.BR.1.3.1-09730-8952.0 -m LA.BR.1.3.1-09730-8952.0.xml
$ repo sync -cd -j32

where is console service declared
Console is a service defined in init.rc. This service is a shell which is disabled by default. This service sets console flag.

service console /system/bin/sh
    class core
    console
    disabled
    user shell
    group shell log
    seclabel u:r:shell:s0

when does console service start
init process starts this service when property property:ro.debuggable=1.

on property:ro.debuggable=1
    start console

the standard input/output of console service
Since the console service sets console flag. init process will connect the standard input/output to uart console.

void service_start(struct service *svc, const char *dynamic_args)
{
    struct stat s;
    pid_t pid;
    int needs_console;
......
    needs_console = (svc->flags & SVC_CONSOLE) ? 1 : 0;
    if (needs_console && (!have_console)) {
        ERROR("service '%s' requires console\n", svc->name);
        svc->flags |= SVC_DISABLED;
        return;
    }
......
        if (needs_console) {
            setsid();
            open_console();
}

static void open_console()
{
    int fd;
    if ((fd = open(console_name, O_RDWR)) < 0) {
        fd = open("/dev/null", O_RDWR);
    }
    ioctl(fd, TIOCSCTTY, 0);
    dup2(fd, 0);
    dup2(fd, 1);
    dup2(fd, 2);
    close(fd);
}

debug with uart console
printk driver prints kernel log into all consoles including uart console. Therefore, we could read kernel log from uart cable at runtime. Since console services is a shell whose standard input/output is connected to uart console, we could type in all commands in console as in adb shell. The results of shell commands are displayed in uart console.

If adb shell doesn’t work, it’s very hard to debug why device couldn’t boot successfully. If uart console and console service are both enabled, we could type in logcat in uart console and see android logcat logs from uart console directly.

In this case, we typed in logcat in console uart and found that surfaceflinger service repeatedly crashed and caused zygote and frameworks to restart also.

[   38.049192] ipa-wan ipa_wwan_ioctl:1311 dev(rmnet_data3) register to IPA
[   38.061083] ipa-wan ipa_wwan_ioctl:1311 dev(rmnet_data4) register to IPA
logcat
[   38.088650] ipa-wan ipa_wwan_ioctl:1311 dev(rmnet_data5) register to IPA
[   38.103813] ipa-wan ipa_wwan_ioctl:1311 dev(rmnet_data6) register to IPA
--------- beginning of main
W/hld     (  430): version 1.1
W/hld     (  430): trigger_hdf_i[   38.116226] ipa-wan ipa_wwan_ioctl:1311 dev(rmnet_data7) register to IPA
......
I/DEBUG   (  495): Revision: '0'
I/DEBUG   (  495): ABI: 'arm64'
I/DEBUG   (  495): pid: 2245, tid: 2245, name: surfaceflinger  >>> /system/bin/surfaceflinger <<<
I/DEBUG   (  495): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
I/DEBUG   (  495):     x0   00000055935f4860  x1   0000000000000000  x2   0000000000000000  x3   00000[   62.696430] init: process 'usb_uicc_daemon', pid 522 exited
07fa48e3000
I/DEBUG   (  495):     x4   0000007fa48e3000  x5   0000007fa4b39038  x6   0000000000000001  x7   6566667562656d61
[   62.708979] init: process 'surfaceflinger', pid 2594 exited
I/DEBUG   (  495):     x8   0000000000000000  x9   0000000000000000  x10  0000000000000018  x11  0000000000000001
I/DEBUG   (  495):     x12  0000000000000001  x13  0000007fa4bd8e88  x14  000000559354e000  x15  0000007fa4bd8ed0
I/DEBUG   (  495):     x16  0000007fa48e3588  x17  0000007fa4b196cc  x18  0000000000000000  x19  00000055936b6950
I/DEBUG   (  495):     x20  0000000000000000  x21  00000055936b2600  x22  0000007fa489d46c  x23  0000007fa489d5dc
I/DEBUG   (  495):     x24  0000007fa4aaa016  x25  0[   62.759338] init: process 'surfaceflinger' killing any children in process group
000007fa4aaa000  x26  00000055936b6950  x27  0000000000000000
I/DEBUG   (  495):     x28  0000000000000000  x29  0000007fe769d860  x30  0000007fa4a87d28
I/DEBUG   (  495):     sp   0000007fe769d830  pc   0000007fa4a8996c  pstate 0000000060000000
D/Netd    ( 2100): execIptables: -t filter -D OUTPUT -j fw_OUTPUT -w res=1
D/Netd    ( 2100): execIptables: -t filter -F fw_OUTPUT -w 
I/DEBUG   (  495): 
I/DEBUG   (  495): backtrace:
I/DEBUG   (  495):     #00 pc 000000000003996c  /system/lib64/libsurfaceflinger.so
I/DEBUG   (  495):     #01 pc 0000000000037d24  /system/lib64/libsurfaceflinger.so
I/DEBUG   (  495):     #02 pc 0000[   62.824307] init: service 'zygote' is being killed
000000029c70  /system/lib64/libsurfaceflinger.so (android::SurfaceFlinger::init()+508)
I/DEBUG   (  495):     #03 pc 0000000000000ecc  /system/bin/surfaceflinger
I/DEBUG   (  495):     #04 pc 00000000000138f4  /system/lib64/libc.so (__libc_init+100)
I/DEBUG   (  495):     #05 pc 000000000000102c  /system/bin/surfaceflinger

conclusion
In this post, we discuss what is console service, how it is declared in init.rc, how it is created by init process. Finally, we discuss how developer could tap into console service to get logcat log in uart console to debug boot failure issues.

android: what is console

September 23, 2015

This post is to discuss what is console. What console is depends on the context. It could refer to each below item.

  • a service defined in init.rc: android: init: console service
  • a tty device connected to standard input and output of console service
  • a driver to which print writes kernel log. One of them is connected to above tty device
  • android: arm: bootloader: how (L)ittle (K)ernel loads boot.img

    September 22, 2015

    This post is to discuss how (L)ittle (K)ernel loads boot.img in arm architecture. The reference source code here is CodeAuora LA.BR.1.3.1-09730-8952.0.

    download the reference code

    $ repo init -u git://codeaurora.org/quic/la/platform/manifest -b refs/tags/LA.BR.1.3.1-09730-8952.0 -m LA.BR.1.3.1-09730-8952.0.xml
    $ repo sync -cd -j32
    

    code revision

    • android-4.0.4_r1.2 BUILD_ID=IMM76I
    • kernel 3.0.21 Sneaky Weasel

    boot image format supported by this LK
    In android: boot.img , we discuss boot image format. In this case, the boot image is customized by SOC vendors. This customized boot image has additional device tree part. The image in this part is called dt.img.

    /*
    ** +-----------------+ 
    ** | boot header     | 1 page
    ** +-----------------+
    ** | kernel          | n pages  
    ** +-----------------+
    ** | ramdisk         | m pages  
    ** +-----------------+
    ** | second stage    | o pages
    ** +-----------------+
    ** | device tree     | p pages
    ** +-----------------+
    **
    ** n = (kernel_size + page_size - 1) / page_size
    ** m = (ramdisk_size + page_size - 1) / page_size
    ** o = (second_size + page_size - 1) / page_size
    ** p = (dt_size + page_size - 1) / page_size
    ** 0. all entities are page_size aligned in flash
    

    little kernel memory layout
    Within the memory layout, LK loads boot image into SCRATCH_ADDR at first. In this case, SCRATCH_ADDR functions as temporary buffer.

    static mmu_section_t mmu_section_table[] = {
    /*           Physical addr,         Virtual addr,            Size (in MB),     Flags */
    	{    MEMBASE,               MEMBASE,                 (MEMSIZE / MB),   LK_MEMORY},
    	{    MSM_IOMAP_BASE,        MSM_IOMAP_BASE,          MSM_IOMAP_SIZE,   IOMAP_MEMORY},
    	{    APPS_SS_BASE,          APPS_SS_BASE,            APPS_SS_SIZE,      IOMAP_MEMORY},
    	{    MSM_SHARED_IMEM_BASE,  MSM_SHARED_IMEM_BASE,    1,                COMMON_MEMORY},
    	{    SCRATCH_ADDR,          SCRATCH_ADDR,            512,              SCRATCH_MEMORY},
    	{    MIPI_FB_ADDR,          MIPI_FB_ADDR,            42,              COMMON_MEMORY},
    };
    
    MEMBASE := 0x8F600000 # SDRAM
    MEMSIZE := 0x00300000 # 3MB
    
    BASE_ADDR        := 0x80000000
    SCRATCH_ADDR     := 0x90000000
    

    loads boot image from emmc into DDR

    • find boot partition in emmc
    • int boot_linux_from_mmc(void)
      {
      ......
      	if (!boot_into_recovery) {
      		index = partition_get_index("boot");
      		ptn = partition_get_offset(index);
      		if(ptn == 0) {
      			dprintf(CRITICAL, "ERROR: No boot partition found\n");
                          return -1;
      		}
      	}
      ......
      }
      
    • load boot image header
    • int boot_linux_from_mmc(void)
      {
      ......
      	if (mmc_read(ptn + offset, (uint32_t *) buf, page_size)) {
      		dprintf(CRITICAL, "ERROR: Cannot read boot image header\n");
                      return -1;
      	}
      
      	if (memcmp(hdr->magic, BOOT_MAGIC, BOOT_MAGIC_SIZE)) {
      		dprintf(CRITICAL, "ERROR: Invalid boot image header\n");
                      return -1;
      	}
      ......
      }
      
    • calculate image size
    • Assign image_addr as SCRATCH_ADDR.

      int boot_linux_from_mmc(void)
      {
      ......
      	kernel_actual  = ROUND_TO_PAGE(hdr->kernel_size,  page_mask);
      	ramdisk_actual = ROUND_TO_PAGE(hdr->ramdisk_size, page_mask);
      
      	image_addr = (unsigned char *)target_get_scratch_address();
      
      #if DEVICE_TREE
      	dt_actual = ROUND_TO_PAGE(hdr->dt_size, page_mask);
      	imagesize_actual = (page_size + kernel_actual + ramdisk_actual + dt_actual);
      #else
      	imagesize_actual = (page_size + kernel_actual + ramdisk_actual);
      #endif
      ......
      }
      
    • copy boot image to DDR
    • LK loads boot image into image_addr at first.

      int boot_linux_from_mmc(void)
      {
      ......
      	dprintf(INFO, "Loading boot image (%d): start\n", imagesize_actual);
      	bs_set_timestamp(BS_KERNEL_LOAD_START);
      
      	/* Read image without signature */
      	if (mmc_read(ptn + offset, (void *)image_addr, imagesize_actual))
      	{
      		dprintf(CRITICAL, "ERROR: Cannot read boot image\n");
      		return -1;
      	}
      
      	dprintf(INFO, "Loading boot image (%d): done\n", imagesize_actual);
      ......
      }
      

    load kernel into DDR
    Check if boot image is gzip format. If yes, decompress compressed kernel.

    int boot_linux_from_mmc(void)
    {
    ......
    	if (is_gzip_package((unsigned char *)(image_addr + page_size), hdr->kernel_size))
    	{
    		out_addr = (unsigned char *)(image_addr + imagesize_actual + page_size);
    		out_avai_len = target_get_max_flash_size() - imagesize_actual - page_size;
    		dprintf(INFO, "decompressing kernel image: start\n");
    		rc = decompress((unsigned char *)(image_addr + page_size),
    				hdr->kernel_size, out_addr, out_avai_len,
    				&dtb_offset, &out_len);
    		if (rc)
    		{
    			dprintf(CRITICAL, "decompressing kernel image failed!!!\n");
    			ASSERT(0);
    		}
    
    		dprintf(INFO, "decompressing kernel image: done\n");
    		kptr = (struct kernel64_hdr *)out_addr;
    		kernel_start_addr = out_addr;
    		kernel_size = out_len;
    	} else {
    		kptr = (struct kernel64_hdr *)(image_addr + page_size);
    		kernel_start_addr = (unsigned char *)(image_addr + page_size);
    		kernel_size = hdr->kernel_size;
    	}
    ......
    }
    

    update kernel, ramdisk, atags address

    int boot_linux_from_mmc(void)
    {
    ......
    	/*
    	 * Update the kernel/ramdisk/tags address if the boot image header
    	 * has default values, these default values come from mkbootimg when
    	 * the boot image is flashed using fastboot flash:raw
    	 */
    	update_ker_tags_rdisk_addr(hdr, IS_ARM64(kptr));
    
    	/* Get virtual addresses since the hdr saves physical addresses. */
    	hdr->kernel_addr = VA((addr_t)(hdr->kernel_addr));
    	hdr->ramdisk_addr = VA((addr_t)(hdr->ramdisk_addr));
    	hdr->tags_addr = VA((addr_t)(hdr->tags_addr));
    ......
    }
    
    static void update_ker_tags_rdisk_addr(struct boot_img_hdr *hdr, bool is_arm64)
    {
    	/* overwrite the destination of specified for the project */
    #ifdef ABOOT_IGNORE_BOOT_HEADER_ADDRS
    	if (is_arm64)
    		hdr->kernel_addr = ABOOT_FORCE_KERNEL64_ADDR;
    	else
    		hdr->kernel_addr = ABOOT_FORCE_KERNEL_ADDR;
    	hdr->ramdisk_addr = ABOOT_FORCE_RAMDISK_ADDR;
    	hdr->tags_addr = ABOOT_FORCE_TAGS_ADDR;
    #endif
    }
    
    DEFINES += CRYPTO_BAM=1
    DEFINES += SPMI_CORE_V2=1
    DEFINES += ABOOT_IGNORE_BOOT_HEADER_ADDRS=1
    
    #define SDRAM_START_ADDR                   0x80000000
    
    #define DDR_START                          get_ddr_start()
    #define ABOOT_FORCE_KERNEL_ADDR            DDR_START + 0x8000
    #define ABOOT_FORCE_KERNEL64_ADDR          DDR_START + 0x80000
    #define ABOOT_FORCE_RAMDISK_ADDR           DDR_START + 0x2000000
    #define ABOOT_FORCE_TAGS_ADDR              DDR_START + 0x1E00000
    

    move kernel, ramdisk and device tree to correct address

    int boot_linux_from_mmc(void)
    {
    ......
    	/* Move kernel, ramdisk and device tree to correct address */
    	memmove((void*) hdr->kernel_addr, kernel_start_addr, kernel_size);
    	memmove((void*) hdr->ramdisk_addr, (char *)(image_addr + page_size + kernel_actual), hdr->ramdisk_size);
    ......
    }
    

    setup atags
    This version of LK supports loading device tree from dt.img. It also supports decompress dt.img if it is compressed in gzip format. The LK also supports loading device tree appended to kernel.

    int boot_linux_from_mmc(void)
    {
    ......
    #if DEVICE_TREE
    	if(hdr->dt_size) {
    		dt_table_offset = ((uint32_t)image_addr + page_size + kernel_actual + ramdisk_actual + second_actual);
    		table = (struct dt_table*) dt_table_offset;
    
    		if (dev_tree_validate(table, hdr->page_size, &dt_hdr_size) != 0) {
    			dprintf(CRITICAL, "ERROR: Cannot validate Device Tree Table \n");
    			return -1;
    		}
    
    		/* Find index of device tree within device tree table */
    		if(dev_tree_get_entry_info(table, &dt_entry) != 0){
    			dprintf(CRITICAL, "ERROR: Getting device tree address failed\n");
    			return -1;
    		}
    
    		if (is_gzip_package((unsigned char *)dt_table_offset + dt_entry.offset, dt_entry.size))
    		{
    			unsigned int compressed_size = 0;
    			out_addr += out_len;
    			out_avai_len -= out_len;
    			dprintf(INFO, "decompressing dtb: start\n");
    			rc = decompress((unsigned char *)dt_table_offset + dt_entry.offset,
    					dt_entry.size, out_addr, out_avai_len,
    					&compressed_size, &dtb_size);
    			if (rc)
    			{
    				dprintf(CRITICAL, "decompressing dtb failed!!!\n");
    				ASSERT(0);
    			}
    
    			dprintf(INFO, "decompressing dtb: done\n");
    			best_match_dt_addr = out_addr;
    		} else {
    			best_match_dt_addr = (unsigned char *)dt_table_offset + dt_entry.offset;
    			dtb_size = dt_entry.size;
    		}
    
    		/* Validate and Read device device tree in the tags_addr */
    		if (check_aboot_addr_range_overlap(hdr->tags_addr, dtb_size))
    		{
    			dprintf(CRITICAL, "Device tree addresses overlap with aboot addresses.\n");
    			return -1;
    		}
    
    		memmove((void *)hdr->tags_addr, (char *)best_match_dt_addr, dtb_size);
    	} else {
    		/* Validate the tags_addr */
    		if (check_aboot_addr_range_overlap(hdr->tags_addr, kernel_actual))
    		{
    			dprintf(CRITICAL, "Device tree addresses overlap with aboot addresses.\n");
    			return -1;
    		}
    		/*
    		 * If appended dev tree is found, update the atags with
    		 * memory address to the DTB appended location on RAM.
    		 * Else update with the atags address in the kernel header
    		 */
    		void *dtb;
    		dtb = dev_tree_appended((void*)(image_addr + page_size),
    					hdr->kernel_size, dtb_offset,
    					(void *)hdr->tags_addr);
    		if (!dtb) {
    			dprintf(CRITICAL, "ERROR: Appended Device Tree Blob not found\n");
    			return -1;
    		}
    	}
    	#endif
    ......
    }
    

    boot linux

  • If device tree is enabled, cmdline is stored in bootargs node of device tree.
  • In update_cmdline(), LK loads cmdline from boot image header, and adds the commands in this cmdline into bootargs node of device tree.
  • LK is arm-32 bit. If kernel is also arm-32 bit, then LK could jump to kernel directly with entry(0, machtype, (unsigned*)tags_phys)
  • According to Documentation/arm/Booting, before jumping to kernel, r0 is set as 0, r1 is set as machine type, and r3 is set as atags address.
  • int boot_linux_from_mmc(void)
    {
    ......
    	boot_linux((void *)hdr->kernel_addr, (void *)hdr->tags_addr,
    		   (const char *)hdr->cmdline, board_machtype(),
    		   (void *)hdr->ramdisk_addr, hdr->ramdisk_size);
    ......
    }
    
    typedef void entry_func_ptr(unsigned, unsigned, unsigned*);
    void boot_linux(void *kernel, unsigned *tags,
    		const char *cmdline, unsigned machtype,
    		void *ramdisk, unsigned ramdisk_size)
    {
    	unsigned char *final_cmdline;
    #if DEVICE_TREE
    	int ret = 0;
    #endif
    
    	void (*entry)(unsigned, unsigned, unsigned*) = (entry_func_ptr*)(PA((addr_t)kernel));
    	uint32_t tags_phys = PA((addr_t)tags);
    	struct kernel64_hdr *kptr = (struct kernel64_hdr*)kernel;
    
    	ramdisk = (void *)PA((addr_t)ramdisk);
    
    	final_cmdline = update_cmdline((const char*)cmdline);
    
    #if DEVICE_TREE
    	dprintf(INFO, "Updating device tree: start\n");
    
    	/* Update the Device Tree */
    	ret = update_device_tree((void *)tags,(const char *)final_cmdline, ramdisk, ramdisk_size);
    	if(ret)
    	{
    		dprintf(CRITICAL, "ERROR: Updating Device Tree Failed \n");
    		ASSERT(0);
    	}
    	dprintf(INFO, "Updating device tree: done\n");
    #else
    	/* Generating the Atags */
    	generate_atags(tags, final_cmdline, ramdisk, ramdisk_size);
    #endif
    ......
    	/* do any platform specific cleanup before kernel entry */
    	platform_uninit();
    
    	arch_disable_cache(UCACHE);
    
    #if ARM_WITH_MMU
    	arch_disable_mmu();
    #endif
    	bs_set_timestamp(BS_KERNEL_ENTRY);
    
    	if (IS_ARM64(kptr))
    		/* Jump to a 64bit kernel */
    		scm_elexec_call((paddr_t)kernel, tags_phys);
    	else
    		/* Jump to a 32bit kernel */
    		entry(0, machtype, (unsigned*)tags_phys);
    }
    

    conclusion
    This post goes through the code flow in which how LK loads boot.img into DDR, sets up kernel, ramdisk, atgas, and then jump to kernel. LK is arm 32-bit. If kernel is arm 32-bit, LK could jump to kernel directly. If kernel is arm-64 bit, LK couldn’t jump to kernel directly. We’ll cover this topic in another post.

    android: bootimg: modify ramdisk in boot.img

    September 14, 2015

    In android: boot.img, we discuss what boot.img is. In this post, we further discussing how to modify ramdisk in boot.img.

    Below is an example to modify init.rc in the ramdisk of a boot.img.

  • download android-bootimg-scripts
  • $ git clone git@github.com:chengyihe/android-bootimg-scripts.git
    
  • break the boot.img into kernel and ramdisk(dt if it exists)
  • $ bootimg_parser.py --bootimg=boot.img
    $ ls
    boot.img  kernel  ramdisk
    
  • break ramdisk
  • $ mv ramdisk ramdisk.img
    $ mkdir ramdisk_dir
    $ cd ramdisk_dir
    $ gunzip -c ../ramdisk.img | cpio -i
    
  • modify ramdisk
  • $ vi init.rc
    
  • re-compile ramdisk
  • $ find . | cpio -o -H newc | gzip > ../newramdisk.img
    $ cd ..
    
  • update ramdisk in boot.img
  • $ bootimg_updater.py --bootimg=boot.img --ramdisk=newramdisk.img --output=newboot.img
    
  • check if init.rc is modified after newboot.img is flashed into device and device reboots
  • $ adb pull init.rc 
    

    android: boot.img

    September 14, 2015

    This post is to discuss boot.img. We discuss how to build it, what it is composed of and how bootloader processes it. The reference source code here is android-5.1.1_r18.

    how to build boot.img
    After getting the source code of the device you are developing, you could build it with android build system.

    $ source build/envsetup.sh
    $ lunch ${project}-eng
    $ make bootimg
    

    After build pass, you will find boot.img in out/target/product/${project}/boot.img

    what is boot.img composed of
    In AOSP boot.img format, the boot.img is composed of kernel and ramdisk.

    /*
    ** +-----------------+ 
    ** | boot header     | 1 page
    ** +-----------------+
    ** | kernel          | n pages  
    ** +-----------------+
    ** | ramdisk         | m pages  
    ** +-----------------+
    ** | second stage    | o pages
    ** +-----------------+
    **
    ** n = (kernel_size + page_size - 1) / page_size
    ** m = (ramdisk_size + page_size - 1) / page_size
    ** o = (second_size + page_size - 1) / page_size
    

    In another post, android: arm: bootloader: how (L)ittle (K)ernel loads boot.img, we see that some vendor customizes the format of boot image and its corresponding bootloader should support loading these customized boot images.

    where is device tree if boot.img has only kernel and ramdisk
    The device tree is appended to kernel. And the result combined image name is Image-dtb, zImage-dtb, or Image.gz-dtb. bootloader could separate kernel and dt from these images.

    where is boot.img installed
    The method to install boot.img depends. This image is usually installed in an emmc partition.

    how bootloader loads boot.img
    In another post, android: arm: bootloader: how (L)ittle (K)ernel loads boot.img, we introduce how (L)ittle (K)ernel loads boot.img and handovers the control of CPU to kernel.

    conclusion
    In the post, we discuss the format of boot image and how bootloader loads boot image. In another post, android: arm: bootloader: how (L)ittle (K)ernel loads boot.img, we introduce SOC vendors’s customized boot image and how its corresponding bootloader loads the customized boot image.

    android: mm: dynamic memory system

    September 7, 2015

    In arm: mm: memory usage distribution , we discuss the spectrum of memory usage distributions. In this post, we discuss the dynamic memory system in which the memory usage distribution changes over time. This post only discuss the condition at which SwapTotal = 0 to simplify explanations.

    memory status after boot
    After boot, Cached is high and MemFree is low. Under this condition, both memory efficiency and performance is high.
    dynamic_memory_01

    while MemFree is below watermark
    If some applications are launched and MemFree is below watermark, then kswapd is triggered to reclaim and trigger shrinkers.

    • Reclaim transforms the cached pages into free pages. Its effect is to move memory status along the MemFree+Cached=k.
    • Shrinkers shrink driver cached pages and return them back to free pages. Its effect is to push the memory status away from the MemFree+Cached=k.

    dynamic_memory_02

    while MemFree + Cached is lower
    If memory pressure is high, the shrinkers might have little effects. The memory status might only move along the MemFree+Cached=k.
    dynamic_memory_03

    while both MemFree, Cached are lower than LMK(lowmemorykiller) threshold
    If memory pressure is higher, the reclaim moves the memory status into the square in which both MemFree and Cached are below LMK threshold. Then LMK is triggered to move the memory status away from the MemFree+Cached=k.
    dynamic_memory_04

    while reclaim efficiency is low
    If cached is too low, then OOM(out of memory) killer will be triggered to move the memory status away from the MemFree+Cached=k.
    dynamic_memory_05

    while file cache is too low
    This might be the worst case. The cache is already very low, but reclaim still tries to move the memory status along up the MemFree+Cached=k. Another force to request file caches will move the memory status along down the MemFree+Cached=K. The two forces counteract with each other and make the memory status stall at the point. Under this condition, the system performance is too low to trigger frameworks watchdog timeout.

    The post android: device freeze: some thread are blocked at shrink_inactive_list
    discuss a case in which MemFree accounting is incorrectly high which disables LMK. Since memory status couldn’t be move away from MemFree+Cached=k, the system keep reclaiming and make Cached too low. Finally the device is freeze.
    dynamic_memory_06

    conclusion
    This methodology makes it easy to analyze Android dynamic memory system. The reclaim could help the memory status move along the MemFree+Cached=k to increase MemFree. But reclaim couldn’t increase MemFree+Cached. On the other hand, the LMK and OOM killer could increase MemFree+Cached, i.e., move the memory status away from MemFree+Cached=k. It’s better to trigger LMK rather than OOM killer in Android system in that OOM killer might kill important process such as SystemSever at first. On the contrast, the LMK prefers to choose the less important applications to kill.

    android: binder: binder thread encoutners native crash due to bad commands from binder driver

    September 4, 2015

    This post is to discuss a case in which a binder thread encoutners native crash due to bad commands from binder driver. The reference code is qualcomm msm8974 release in which frameworks is KitKat 4.4.2 and the kernel is Linux 3.4.0.

    Symptom: IPCThreadState: *** BAD COMMAND 29190 received from Binder driver, and then the binder thread calls abort
    For this case, I’ll introduce many binder transaction examples, and finally demonstrate in which condition does this case happen.

    06-14 21:46:25.958  1067  1526 E IPCThreadState: *** BAD COMMAND 29190 received from Binder driver
    06-14 21:46:25.958  1067  1526 E IPCThreadState: getAndExecuteCommand(fd=12) returned unexpected error -2147483648, aborting
    06-14 21:46:25.958  1067  1526 F libc    : Fatal signal 6 (SIGABRT) at 0x0000042b (code=-6), thread 1526 (Binder_B)
    
    pid: 1067, tid: 1526, name: Binder_B  >>> system_server <<<
    signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
        r0 00000000  r1 000005f6  r2 00000006  r3 00000000
        r4 00000006  r5 0000000c  r6 000005f6  r7 0000010c
        r8 4019b081  r9 81c4a000  sl 81c49bac  fp 4013f2f4
        ip 00000000  sp 81d47cc0  lr 40103c79  pc 40112c70  cpsr 000f0010
        d0  0000000000000000  d1  0000000000000000
        d2  0000000000000000  d3  0000000000000000
        d4  3a30333030303030  d5  3730303936303020
        d6  3030653630302034  d7  3038363030206636
        d8  0000000000000000  d9  0000000000000000
        d10 0000000000000000  d11 0000000000000000
        d12 0000000000000000  d13 0000000000000000
        d14 0000000000000000  d15 0000000000000000
        d16 6e72757465722029  d17 7078656e75206465
        d18 520a30203a657a69  d19 2064657669656365
        d20 73646e616d6d6f63  d21 7264206d6f726620
        d22 3030203a72657669  d23 3020633032373030
        d24 3f56c16c16c76a94  d25 3f81111111185da8
        d26 3fa555555555551c  d27 3fc55555555554db
        d28 3fe0000000000000  d29 0000000000000001
        d30 fff0000000000000  d31 4000000000000000
        scr 68000011
    
    backtrace:
        #00  pc 00022c70  /system/lib/libc.so (tgkill+12)
        #01  pc 00013c75  /system/lib/libc.so (pthread_kill+48)
        #02  pc 00013e89  /system/lib/libc.so (raise+10)
        #03  pc 00012a23  /system/lib/libc.so
        #04  pc 00022524  /system/lib/libc.so (abort+4)
        #05  pc 00021809  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+124)
        #06  pc 0002658d  /system/lib/libbinder.so
        #07  pc 00012635  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+216)
        #08  pc 0004fdf7  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
        #09  pc 000120b7  /system/lib/libutils.so
        #10  pc 0000d2c0  /system/lib/libc.so (__thread_entry+72)
        #11  pc 0000d458  /system/lib/libc.so (pthread_create+240)
    

    preliminary: example#1: an asynchronous binder transaction

    1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).

    transaction_01

    preliminary: example#2: a synchronous binder transaction

    1. The client thread(3684:3684) requests a synchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).

    transaction_02

    preliminary: example#3: an asynchronous binder transaction which triggers another synchronous binder transaction

    1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the binder thread(1067:1526).
    2. The client thread(1067:1526) requests a synchronous binder transaction to a binder service provided by the process(387). This transaction is served by the binder thread(387:1008).

    transaction_03

    preliminary: example#4: two asynchronous binder transactions to the same service, each further triggering another binder transaction

    1. The client thread(3684:3684) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the thread(1067:1526).
    2. The client thread(1067:1526) requests a synchronous binder transaction to a binder service provided by the process(387). This transaction is served by the binder thread(387:1008).
    3. The client thread(1726:1726) requests an asynchronous binder transaction to a binder service provided by the process(1067). This transaction is served by the thread(1067:1526).
    4. The client thread(1067:1526) requests an asynchronous binder transaction to a binder service provided by the process(4267). This transaction is served by the binder thread(4267:5488).

    transaction_04

    Why the asynchronous transaction requested by the client thread(1726:1726) is blocked until the asynchronous transaction requested by the client thread(3684:3684) is completed by the binder thread(1067:1526)?

    • Both the client thread(3684:3684) and the client thread(1726:1726) request an asynchronous transaction to the same binder service provided by the process(1067).
    • The asynchronous transaction requested by the client thread(1726:1726) is en-queued into the binder service’s todo list since another asynchronous transaction to the same binder service requested by the client thread(3684:3684) is being served.
    • The en-queued asynchronous transaction is moved into the binder thread(1067:1526)’s todolist after the binder thread(1067:1526) completes the asynchronous transaction requested by client thread(3684:3684).
    • The binder thread(1067:1526) de-queues and executes the asynchronous binder transaction requested by the client thread(1726:1726).

    analysis: what could make this symptom happen
    If a BC_FREE_BUFFER command is sent as below diagram, then the following situation would happen:

    1. The binder thread(1726:1726) receives two BR_TRANSACTION commands consecutively.
    2. The binder thread(1726:1726) requests two BC_TRANSACTION commands consecutively.
    3. The binder thread(1726:1726) receives two BC_TRANSACTION_COMPLETE consecutively.

    This symptom happens while the binder threads receives two BR_TRANSACTION_COMPLETE commands consecutively. So the problem is reduced to why the binder thread(1726:1726) requests BC_FREE_BUFFER unexpectedly.
    transaction_05

    analysis: why the binder thread(1726:1726) requests BC_FREE_BUFFER unexpectedly
    The BC_FREE_BUFFER commands is executed in processPendingDerefs() in which some binder reference is clear and a destructor frees a binder buffer. This bug doesn’t appear in AOSP and pure vendor release.

    void IPCThreadState::joinThreadPool(bool isMain)
    {
        LOG_THREADPOOL("**** THREAD %p (PID %d) IS JOINING THE THREAD POOL\n", (void*)pthread_self(), getpid());
        mOut.writeInt32(isMain ? BC_ENTER_LOOPER : BC_REGISTER_LOOPER);
        
        // This thread may have been spawned by a thread that was in the background
        // scheduling group, so first we will make sure it is in the foreground
        // one to avoid performing an initial transaction in the background.
        set_sched_policy(mMyThreadId, SP_FOREGROUND);
            
        status_t result;
        do {
            processPendingDerefs();
            // now get the next command to be processed, waiting if necessary
            result = getAndExecuteCommand();
            if (result < NO_ERROR && result != TIMED_OUT && result != -ECONNREFUSED && result != -EBADF) {
                ALOGE("getAndExecuteCommand(fd=%d) returned unexpected error %d, aborting",
                      mProcess->mDriverFD, result);
                abort();
            }
            
            // Let this thread exit the thread pool if it is no longer
            // needed and it is not the main process thread.
            if(result == TIMED_OUT && !isMain) {
                break;
            }
        } while (result != -ECONNREFUSED && result != -EBADF);
    

    conclusion
    In IPCThreadState::joinThreadPool(), the message loop of a binder thread, if processPendiDerefs() calls a destructor which calls freeBuffer, then it might send a unexpected BC_FREE_BUFFER command. The unexpected BC_FREE_BUFFER causes two consecutive BR_TRANSACTIONS, two consecutive BC_TRANSACTIONS, and two consecutive BR_TRANSACTION_COMPLETE. Finally, the binder thread will detect a BAD COMMAND and call abort to exit the process.


    %d bloggers like this: