Archive for July, 2015

vi: edit files in binary mode

July 30, 2015

This post shows how to edit files in binary mode.

  • Create a binary file
  • $ dd if=/dev/zero of=a.out bs=16 count=2
    2+0 records in
    2+0 records out
    32 bytes (32 B) copied, 8.8878e-05 s, 360 kB/s
    
  • Edit the file in binary mode
  • $ vi -b a.out
    ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
    
  • Feed buffer contents into the standard input of xxd and replace the original buffer with the standard output of xxd
  • :%!xxd
    0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    
  • Update the second byte as 0xab
  • 0000000: 00ab 0000 0000 0000 0000 0000 0000 0000  ................
    0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    
  • Feed buffer contents into the standard input of xxd -r and replace the original buffer with the standard output of xxd -r
  • :%!xxd -r
    ^@?^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
    
  • Save and close the file
  • :wq
    
  • xxd shows the binary file is updated as expected
  • $ xxd a.out
    0000000: 00ab 0000 0000 0000 0000 0000 0000 0000  ................
    0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    

    arm: mm: memory layout

    July 29, 2015

    This post is to discuss the memory layout in arm 32-bit architecture. The reference source code here is qualcomm msm kernel release 3.4.0.

    Be default, MMU is enabled in arm 32-bit architecture. To understand the memory layout of a arm 32-bit device, it’s needed to figure out the physical and virtual memory layouts of the device, and how the two address spaces map to each other.

    config MMU
            bool "MMU-based Paged Memory Management Support"
            default y
            help
              Select if you want MMU-based virtualised addressing space
              support by paged memory management. If unsure, say 'Y'. 
    

    physical memory layout
    The physical memory layout of a device is described by the banks of meminfo. Each bank corresponds to a contiguous address space.

    struct membank {
            phys_addr_t start; 
            phys_addr_t size;
            unsigned int highmem; 
    };
    
    struct meminfo {
            int nr_banks; 
            struct membank bank[NR_BANKS];
    };
            
    extern struct meminfo meminfo;
    
    /*
     * This keeps memory configuration data used by a couple memory
     * initialization functions, as well as show_mem() for the skipping
     * of holes in the memory map.  It is populated by arm_add_memory().
     */
    struct meminfo meminfo;
    

    Take msm8974 as an example. The physical memory of msm8974 starts at 0x00000000. The bootloader might pass memory size to kernel via kernel command line or tags. Assumably, the meminfo has one memory bank ranging from 0x00000000 to 0x80000000.

    config PHYS_OFFSET                                                                                                                                                                                        
            hex
            default "0x40800000" if ARCH_MSM9615                                                                                                                                                              
            default "0x80200000" if ARCH_APQ8064
            default "0x80200000" if ARCH_MSM8960                                                                                                                                                              
            default "0x80200000" if ARCH_MSM8930
            default "0x00000000" if ARCH_MSM8974                                                                                                                                                              
            default "0x00000000" if ARCH_APQ8084
            default "0x00000000" if ARCH_MPQ8092                                                                                                                                                              
            default "0x00000000" if ARCH_MSM8226
            default "0x00000000" if ARCH_MSM8610                                                                                                                                                              
            default "0x00000000" if ARCH_MSMSAMARIUM
            default "0x10000000" if ARCH_FSM9XXX                                                                                                                                                              
            default "0x00000000" if ARCH_FSM9900
            default "0x00200000" if ARCH_MSM9625                                                                                                                                                              
            default "0x00000000" if ARCH_MSMKRYPTON
            default "0x00200000" if !MSM_STACKED_MEMORY                                                                                                                                                       
            default "0x00000000" if ARCH_QSD8X50 && MSM_SOC_REV_A
            default "0x20000000" if ARCH_QSD8X50                                                                                                                                                              
            default "0x40200000" if ARCH_MSM8X60
            default "0x10000000"
    
    static int __init parse_tag_mem32(const struct tag *tag)
    {
            return arm_add_memory(tag->u.mem.start, tag->u.mem.size);
    }
    __tagtable(ATAG_MEM, parse_tag_mem32);
    

    During initialization, two memory holes split the default memory bank into three banks. The first memory hole ranges from 93 MB to 218 MB. The second memory holes ranges from 250 MB to 255 MB. In msm devices, the memory holes used by modem are located in the first 256 MB. This makes single modem rom run against DDRs of different sizes.

    / {
            model = "Qualcomm MSM 8974";
            compatible = "qcom,msm8974";
            ......
            memory_hole: qcom,msm-mem-hole {
                    compatible = "qcom,msm-mem-hole";
                    qcom,memblock-remove = <0x5d00000 0x7d00000
                                            0xfa00000 0x500000>; /* Address and Size of Hole */
            };
            ......
    }  
    
    /*
     * Function to scan the device tree and adjust the meminfo table to
     * reflect the memory holes.
     */
    /*
     * Function to scan the device tree and adjust the meminfo table to
     * reflect the memory holes.
     */
    int __init dt_scan_for_memory_hole(unsigned long node, const char *uname,
                    int depth, void *data)
    {
            unsigned int *memory_remove_prop;
            unsigned long memory_remove_prop_length;
            unsigned long hole_start;
            unsigned long hole_size;
            unsigned int num_holes = 0;
            int i = 0;
    
            memory_remove_prop = of_get_flat_dt_prop(node,
                                                    "qcom,memblock-remove",
                                                    &memory_remove_prop_length);
    
            if (memory_remove_prop) {
                    if (!check_for_compat(node))
                            goto out;
            } else {
                    goto out;
            }   
    
            if (memory_remove_prop) {
                    if (!memory_remove_prop_length || (memory_remove_prop_length %
                            (2 * sizeof(unsigned int)) != 0)) {
                            WARN(1, "Memory remove malformed\n");
                            goto out;
                    }   
    
                    num_holes = memory_remove_prop_length /
                                            (2 * sizeof(unsigned int));
    
                    for (i = 0; i < (num_holes * 2); i += 2) {
                            hole_start = be32_to_cpu(memory_remove_prop[i]);
                            hole_size = be32_to_cpu(memory_remove_prop[i+1]);
    
                            adjust_meminfo(hole_start, hole_size);
                    }   
            }   
    
    out:
            return 0;
    }
    

    virtual memory layout
    In arm 32-bit architecture, the size of virtual memory layout is 4GB. The lower 3GB is user space while the higher 1GB is kernel space. Within the kernel address space, vmalloc address space occupies the is higher while lowmem is lower. The default size of vmalloc address space in arm 32-bit is 240 MB. Bootloader might set up vmalloc size by command line or tags. In below example, the size of vmalloc address space is 400 MB and the size of lowmem address space is 600 MB.

    choice
            prompt "Memory split"
            default VMSPLIT_3G
            help
              Select the desired split between kernel and user memory.
    
              If you are not absolutely sure what you are doing, leave this 
              option alone!
    
            config VMSPLIT_3G
                    bool "3G/1G user/kernel split"
            config VMSPLIT_2G
                    bool "2G/2G user/kernel split"
            config VMSPLIT_1G
                    bool "1G/3G user/kernel split"
    endchoice
    config PAGE_OFFSET
            hex  
            default 0x40000000 if VMSPLIT_1G
            default 0x80000000 if VMSPLIT_2G
            default 0xC0000000
    
    [ 0.000000] c0 0 Virtual kernel memory layout:
    [ 0.000000] c0 0 vector : 0xffff0000 - 0xffff1000 ( 4 kB)
    [ 0.000000] c0 0 fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)
    [ 0.000000] c0 0 vmalloc : 0xe6000000 - 0xff000000 ( 400 MB)
    [ 0.000000] c0 0 lowmem : 0xc0000000 - 0xe5800000 ( 600 MB)
    [ 0.000000] c0 0 pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB)
    [ 0.000000] c0 0 modules : 0xbf000000 - 0xbfe00000 ( 14 MB)
    [ 0.000000] c0 0 .text : 0xc0008000 - 0xc0b51420 (11558 kB)
    [ 0.000000] c0 0 .init : 0xc0c00000 - 0xc0d03f00 (1040 kB)
    [ 0.000000] c0 0 .data : 0xc0d04000 - 0xc0e50e70 (1332 kB)
    

    memory mapping

    Linear mapping is used to transform addresses between the lowmem address spaces in virtual and physical memory layouts. It’s also used to determine kernel memory mapping during initialization.

    #define __virt_to_phys(x)       ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x)       ((x) - PHYS_OFFSET + PAGE_OFFSET)
    

    In the virtual memory layout, by default, VMALLOC_END is 16 MB below the end of virtual address space. Assume bootloader passes vmalloc=408MB. Then, kernel sets vmalloc_min as VMALLOC_END – 408 MB. VMALLOC_START is the first address higher than vmalloc_min and 8 MB aligned. The difference between vmalloc_min and VMALLOC_START provide a gap between between lowmem address space and vmalloc address space to detect memory corruptions.

    In the physical memory layout, the meminfo has two banks separated by a 300 MB hole. Kernel uses virt_to_phys(vmalloc_min) to determine if a bank is highmem or not. If virt_to_phys(vmalloc_min) lies in a bank, then the bank will be divided into two banks. The higher bank is highmen while the lower one is lowmem.

    The variable arm_lowmem_limit is equal to the highest address of all lomwmem banks. The variable high_memory is equal to phys_to_virt(arm_lowmem_limit). high_memory functions as the end of lowmem in virtual memory layout.

    In a nutshell, the requested vmalloc size determines vmalloc_min and VMALLOC_START in the virtual memory layout. The linear mapping of vmalloc_min determines lowmem and highmem regions in the physical memory layout. Then, the end of the lowmem address space, arm_lowmem_limit, in physical memory layout is linear transformed to virtual address space to determine the variable high_memory, the end of the lowmem address space in virtual memory layout.

    memory_layout_mapping_01

    example 1

    In this example. The lowmem address space in virtual memory layout is as large as the lowmem address space in physical memory layout. It’s correct to use the linear mapping function to transform addresses between the two spaces.

    The vmalloc address space in virtual memory layout is almost as large as the highmem address space in physical memory layout. It’s incorrect to use the linear mapping function to transform addresses between the two address spaces.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 1;
    bank[0].start = 0x80000000;
    bank[0].size = 0x40000000;
    vmalloc size is 400 MB
    mapping function:
    #define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
    

    memory_layout_ex_01

    example 2

    Compared with example 1, the vmalloc address space in virtual memory layout is smaller, meanwhile the lowmem address space in physical memory layout is larger.

    Both vmalloc address space and lowmem address space are resources. There is a trade off to change the size of vmalloc address space and lowmem address space.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 1;
    bank[0].start = 0x80000000;
    bank[0].size = 0x40000000;
    vmalloc size is 240 MB
    linear mapping function:
    #define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
    

    memory_layout_ex_02

    example 3

    If the physical memory size increases to 2GB, then the highmem address space increases accordingly. In this case, virtual memory layout is less than physical memory layout in kernel space. It’s impossible to use linear mapping to transform each address from virtual address space to physical address space.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 1;
    bank[0].start = 0x80000000;
    bank[0].size = 0x80000000;
    vmalloc size is 400 MB
    linear mapping function:
    #define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
    

    memory_layout_ex_03

    example 4

    If there is a hole in physical memory layout, then it’s possible that the lowmem address space in virtual memory layout is larger than the lowmem address space in physical memory layout. During boot initialization, kernel traverses all lowmem banks and sets page tables for these banks. The address space in virtual memory layout linear mapping to the hole in physical memory layout is not allowed to be accessed since the corresponding page table is not setup.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 2;
    bank[0].start = 0x80000000;
    bank[0].size = 136 MB;
    bank[1].start = 0x90000000;
    bank[1].size = 768 MB;
    vmalloc size is 400 MB
    mapping function:
    #define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
    

    memory_layout_ex_04

    example 5

    If the hole is in the highmem address space rather than lowmem address space of physical memory layout, then the lowmem in physical memory layout is larger. It’s better to have larger lowmem address space without compromising others.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 2;
    bank[0].start = 0x80000000;
    bank[0].size = 768 MB;
    bank[1].start = 0xB7800000;;
    bank[1].size = 136 MB;
    vmalloc size is 400 MB
    mapping function:
    #define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
    

    memory_layout_ex_05

    example 6

    If it’s inevitable to have a hole in physical memory layout, then tweaking linear mapping function might help increase lowmem address space size without compromising others.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 2;
    bank[0].start = 0x80000000;
    bank[0].size = 136 MB;
    bank[1].start = 0x90000000;
    bank[1].size = 768 MB;
    vmalloc size is 400 MB
    mapping function:
    #define __phys_to_virt(phys)                            \
            (unsigned long)\
            ((MEM_HOLE_END_PHYS_OFFSET && ((phys) >= MEM_HOLE_END_PHYS_OFFSET)) ? \
            (phys) - MEM_HOLE_END_PHYS_OFFSET + MEM_HOLE_PAGE_OFFSET :      \
            (phys) - PHYS_OFFSET + PAGE_OFFSET)
    
    #define __virt_to_phys(virt)                            \
            (unsigned long)\
            ((MEM_HOLE_END_PHYS_OFFSET && ((virt) >= MEM_HOLE_PAGE_OFFSET)) ? \
            (virt) - MEM_HOLE_PAGE_OFFSET + MEM_HOLE_END_PHYS_OFFSET :      \
            (virt) - PAGE_OFFSET + PHYS_OFFSET)
    

    memory_layout_ex_06

    Virtual Memory Reclaim
    Virtual Memory Reclaim configuration of msm kernel aims to increase memory usage by tweaking memory mapping. Below examples show how this configuration works.

    choice
            prompt "Virtual Memory Reclaim"
            default ENABLE_VMALLOC_SAVING
            help 
              Select the method of reclaiming virtual memory
    
    config DONT_MAP_HOLE_AFTER_MEMBANK0
            bool "Map around the largest hole"
            help 
              Do not map the memory belonging to the largest hole 
              into the virtual space. This results in more lowmem.
              If multiple holes are present, only the largest hole 
              in the first 256MB of memory is not mapped.
    
    config ENABLE_VMALLOC_SAVING
            bool "Reclaim memory for each subsystem"
            help 
              Enable this config to reclaim the virtual space belonging
              to any subsystem which is expected to have a lifetime of
              the entire system. This feature allows lowmem to be non- 
              contiguous.
    
    config NO_VM_RECLAIM
            bool "Do not reclaim memory"
            help 
              Do not reclaim any memory. This might result in less lowmem
              and wasting virtual memory space which could otherwise be
              reclaimed by using any of the other two config options.
    

    example 7

    The configuration adopts the default linear mapping function.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 3;
    bank[0].start = 0x80000000;
    bank[0].size = 0x05B00000; /* 91 MB */
    bank[1].start = 0x05D00000; 
    bank[1].size = 0x01800000; /* 24 MB */
    bank[2].start = 0x12000000;
    bank[2].size = 0x1ED00000; /* 493 MB */
    vmalloc size is 400 MB
    Virtual Memory Reclaim: CONFIG_NO_VM_RECLAIM
    mapping function:
    #define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
    
    Virtual kernel memory layout:                                                                                                                                                                                
        vector  : 0xffff0000 - 0xffff1000   (   4 kB) 
        fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB) 
        vmalloc : 0xe6800000 - 0xff000000   ( 392 MB) 
        lowmem  : 0xc0000000 - 0xe6000000   ( 608 MB) 
        pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB) 
        modules : 0xbf000000 - 0xbfe00000   (  14 MB) 
          .text : 0xc0008000 - 0xc0e0a358   (14345 kB) 
          .init : 0xc0f00000 - 0xc10370c0   (1245 kB) 
          .data : 0xc1038000 - 0xc1135c98   (1016 kB) 
           .bss : 0xc1135cbc - 0xc13ba7b0   (2579 kB)
    
    MemTotal:        1868464 kB
    HighTotal:       1460224 kB
    LowTotal:         408240 kB
    

    memory_layout_ex_07

    example 8

    The configuration tweak linear mapping function to avoid mapping to the largest hole in the first 256 MB of physical memory layout. It helps increase the size of lowmem address space in physical memory layout.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 3;
    bank[0].start = 0x80000000;
    bank[0].size = 0x05B00000; /* 91 MB */
    bank[1].start = 0x05D00000; 
    bank[1].size = 0x01800000; /* 24 MB */
    bank[2].start = 0x12000000;
    bank[2].size = 0x1ED00000; /* 493 MB */
    vmalloc size is 400 MB
    Virtual Memory Reclaim: CONFIG_DONT_MAP_HOLE_AFTER_MEMBANK0
    mapping function:
    #define __phys_to_virt(phys)                            \
            (unsigned long)\
            ((MEM_HOLE_END_PHYS_OFFSET && ((phys) >= MEM_HOLE_END_PHYS_OFFSET)) ? \
            (phys) - MEM_HOLE_END_PHYS_OFFSET + MEM_HOLE_PAGE_OFFSET :      \
            (phys) - PHYS_OFFSET + PAGE_OFFSET)
    
    #define __virt_to_phys(virt)                            \
            (unsigned long)\
            ((MEM_HOLE_END_PHYS_OFFSET && ((virt) >= MEM_HOLE_PAGE_OFFSET)) ? \
            (virt) - MEM_HOLE_PAGE_OFFSET + MEM_HOLE_END_PHYS_OFFSET :      \
            (virt) - PAGE_OFFSET + PHYS_OFFSET)
    
    Virtual kernel memory layout:                                                                                                                                                                                
        vector  : 0xffff0000 - 0xffff1000   (   4 kB) 
        fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB) 
        vmalloc : 0xe6800000 - 0xff000000   ( 392 MB) 
        lowmem  : 0xc0000000 - 0xe6000000   ( 608 MB) 
        pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB) 
        modules : 0xbf000000 - 0xbfe00000   (  14 MB) 
          .text : 0xc0008000 - 0xc0e0a358   (14345 kB) 
          .init : 0xc0f00000 - 0xc10370c0   (1245 kB) 
          .data : 0xc1038000 - 0xc1135c98   (1016 kB) 
           .bss : 0xc1135cbc - 0xc13ba7b0   (2579 kB)
    
    MemTotal:        1868008 kB
    HighTotal:       1286144 kB
    LowTotal:         581864 kB
    

    memory_layout_ex_08

    example 9

    This configuration tweaks vmalloc address space distribution to reclaim unused lowmem address space in virtual address space. It could also increase the size of lowmem address space in physical memory layout.

    This strategy needs a lot of changes in vmalloc related code.

    CONFIG_PAGE_OFFSET 0xC0000000
    CONFIG_PHYS_OFFSET 0x80000000
    meminfo
    nr_banks = 3;
    bank[0].start = 0x80000000;
    bank[0].size = 0x05B00000; /* 91 MB */
    bank[1].start = 0x05D00000; 
    bank[1].size = 0x01800000; /* 24 MB */
    bank[2].start = 0x12000000;
    bank[2].size = 0x1ED00000; /* 493 MB */
    vmalloc size is 400 MB
    Virtual Memory Reclaim: CONFIG_ENABLE_VMALLOC_SAVING
    mapping function:
    #define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
    #define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
    
    Virtual kernel memory layout:
        vector  : 0xffff0000 - 0xffff1000   (   4 kB)
        fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
               vmalloc : 0xf1000000 - 0xff000000   ( 224 MB)
                lowmem : 0xd2000000 - 0xf0d00000   ( 493 MB)
               vmalloc : 0xc7500000 - 0xd2000000   ( 171 MB)
                lowmem : 0xc5d00000 - 0xc7500000   (  24 MB)
               vmalloc : 0xc5b00000 - 0xc5d00000   (   2 MB)
                lowmem : 0xc0000000 - 0xc5b00000   (  91 MB)
                ……
    
    MemTotal:        1868040 kB
    HighTotal:       1283072 kB
    LowTotal:         584968 kB
    

    memory_layout_ex_09

    android: bootmode

    July 21, 2015

    This post is to discuss what is bootmode, who sets bootmode and how user space processes read bootmode. The referenced source code here is android-5.1_LMY47V.

    bootloader sets up kernel command line and uses it as argument to kernel. Kernel reads kernel commands, sets up parameters based on it, and exports it in /proc/cmdline.

    In android, init process reads kernel command line from /proc/cmdline and sets up system properties accordingly. If there exist one argument androidboot.foo=abcd in kernel command line, init process will set property ro.boot.foo=abcd accordingly. If there exist androidboot.mode=mfg in kernel command line, init process will set property ro.boot.mode=mfg accordingly.

    static void import_kernel_nv(char *name, int for_emulator)
    {
        char *value = strchr(name, '=');
        int name_len = strlen(name);
    
        if (value == 0) return;
        *value++ = 0; 
        if (name_len == 0) return;
    
        if (for_emulator) {
            /* in the emulator, export any kernel option with the
             * ro.kernel. prefix */
            char buff[PROP_NAME_MAX];
            int len = snprintf( buff, sizeof(buff), "ro.kernel.%s", name );
    
            if (len < (int)sizeof(buff))
                property_set( buff, value );
            return;
        }    
    
        if (!strcmp(name,"qemu")) {
            strlcpy(qemu, value, sizeof(qemu));
        } else if (!strncmp(name, "androidboot.", 12) && name_len > 12) {
            const char *boot_prop_name = name + 12;
            char prop[PROP_NAME_MAX];
            int cnt; 
    
            cnt = snprintf(prop, sizeof(prop), "ro.boot.%s", boot_prop_name);
            if (cnt < PROP_NAME_MAX)
                property_set(prop, value);
        }    
    }
    

    If property ro.boot.mode=mfg, init process will set property ro.bootmode=mfg accordingly. If property ro.boot.mode is not set, init process will set property ro.bootmode=unknown.

    static void export_kernel_boot_props(void)
    {
        char tmp[PROP_VALUE_MAX];
        int ret; 
        unsigned i;
        struct {
            const char *src_prop;
            const char *dest_prop;
            const char *def_val;
        } prop_map[] = {
            { "ro.boot.serialno", "ro.serialno", "", },
            { "ro.boot.mode", "ro.bootmode", "unknown", },
            { "ro.boot.baseband", "ro.baseband", "unknown", },
            { "ro.boot.bootloader", "ro.bootloader", "unknown", },
        };   
    
        for (i = 0; i < ARRAY_SIZE(prop_map); i++) {
            ret = property_get(prop_map[i].src_prop, tmp);
            if (ret > 0) 
                property_set(prop_map[i].dest_prop, tmp);
            else 
                property_set(prop_map[i].dest_prop, prop_map[i].def_val);
        }    
    

    User space processes could get bootmode by reading property ro.bootmode.

    #include <cutils/log.h>
    #include <cutils/properties.h>
    
    void test()
    {
        char buf[PROPERTY_VALUE_MAX];
        property_get("ro.bootmode", buf, "unknown");
        LOGD("bootmode is %s", buf);
    }
    

    arm64: time: local_clock

    July 21, 2015

    This article is to discuss API local_clock(), including its applications, different implementations on different kernel configurations. The referenced source code here is kernel 3.18.0.

    The API local_clock() provides monotonic local cpu time. One of the most common applications of local_clock() is that each Kernel log message is prefixed with local cpu time.

    [ 0.000000] mpm_init_irq_domain(): Cannot find irq controller for qcom,gpio-parent
    [ 0.000000] MPM 1 irq mapping errored -517
    [ 0.000000] Architected cp15 and mmio timer(s) running at 19.20MHz (virt/virt).
    [ 0.000008] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 3579139424256ns
    [ 0.009087] Console: colour dummy device 80×25
    [ 0.013757] Calibrating delay loop (skipped), value calculated using timer frequency.. 38.40 BogoMIPS (lpj=192000)
    [ 0.023069] pid_max: default: 32768 minimum: 301

    printk driver calls local_clock() to record the local cpu time in ts_nsec field of each log message.

    /* insert record into the buffer, discard old ones, update heads */
    static int log_store(int facility, int level,
                         enum log_flags flags, u64 ts_nsec,
                         const char *dict, u16 dict_len,
                         const char *text, u16 text_len)
    {
    ......
            /* fill message */
            msg = (struct printk_log *)(log_buf + log_next_idx);
            memcpy(log_text(msg), text, text_len);
            msg->text_len = text_len;
            if (trunc_msg_len) {
                    memcpy(log_text(msg) + text_len, trunc_msg, trunc_msg_len);
                    msg->text_len += trunc_msg_len;
            }    
            memcpy(log_dict(msg), dict, dict_len);
            msg->dict_len = dict_len;
            msg->facility = facility;
            msg->level = level & 7; 
            msg->flags = flags & 0x1f;
            LOG_MAGIC(msg);
            if (ts_nsec > 0) 
                    msg->ts_nsec = ts_nsec;
            else 
                    msg->ts_nsec = local_clock();
            memset(log_dict(msg) + dict_len, 0, pad_len);
            msg->len = size;
    ......
    }
    

    While writing logs to consoles, printk driver prepends time to each log message if the variable printk_time is true. Driver developers could enable printk_time by making CONFIG_PRINTK_TIME=y at compile time. Users could enable print_time with the command “echo Y > /sys/module/printk/parameters/time” at runtime.

    static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME);
    module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
    
    static size_t print_time(u64 ts, char *buf)
    {
            unsigned long rem_nsec;
    
            if (!printk_time)
                    return 0;
    
            rem_nsec = do_div(ts, 1000000000);
    
            if (!buf)
                    return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
    
            return sprintf(buf, "[%5lu.%06lu] ",
                           (unsigned long)ts, rem_nsec / 1000);
    }
    

    In local_clock, local means the cpu where current context is running on. So local_clock() might differ on different cpus. Some hardware (such as the x86 TSC) support this feature and needs to enable CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y to let software knows local_clock() value might drift on different cpus. This feature might be beneficial to system performance since there is no need to holding global synchronization lock while accessing to local_clock().

    In the projects I am working on, CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is unset. Both local_clock() and cpu_clock(int cpu) directly call sched_clock().

    #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
    ......
    #else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
    
    void sched_clock_init(void)
    {
            sched_clock_running = 1;
    }
    
    u64 sched_clock_cpu(int cpu)
    {
            if (unlikely(!sched_clock_running))
                    return 0;
    
            return sched_clock();
    }
    
    u64 cpu_clock(int cpu)
    {
            return sched_clock();
    }
    
    u64 local_clock(void)
    {
            return sched_clock();
    }
    
    #endif /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
    

    Kernel implements a default weak sched_clock() which is based upon jiffies. Jiffies is a global variable recording the number of ticks since boot. This variable jiffies is updated by system timer interrupt handler at frequency HZ. In arm architecture, HZ is 100. The resolution limitation of this sched_clock() implementation is 1/HZ.

    /*
     * Scheduler clock - returns current time in nanosec units.
     * This is default implementation.
     * Architectures and sub-architectures can override this.
     */
    unsigned long long __weak sched_clock(void)
    {
            return (unsigned long long)(jiffies - INITIAL_JIFFIES)
                                            * (NSEC_PER_SEC / HZ);
    }
    EXPORT_SYMBOL_GPL(sched_clock);
    
    #ifndef __ASM_GENERIC_PARAM_H
    #define __ASM_GENERIC_PARAM_H
    
    #include <uapi/asm-generic/param.h>
    
    # undef HZ
    # define HZ             CONFIG_HZ       /* Internal kernel timer frequency */
    # define USER_HZ        100             /* some user interfaces are */
    # define CLOCKS_PER_SEC (USER_HZ)       /* in "ticks" like times() */
    #endif /* __ASM_GENERIC_PARAM_H */
    

    In arm64, CONFIG_GENERIC_SCHED_CLOCK is set. kernel/time/sched_clock.c is built-in and implement sched_clock() to override the default weak sched_clock() in kernel/sched/clock.c. This sched_clock() implementation also references jiffies. But it doesn’t directly convert jiffies into clock. It seems to add some logic to it. So far I guess it provides more reliable value by filtering out dramatic changes.

    config ARM64
            def_bool y
            ......
            select GENERIC_IRQ_PROBE
            select GENERIC_IRQ_SHOW
            select GENERIC_SCHED_CLOCK
    
    obj-y += time.o timer.o hrtimer.o itimer.o posix-timers.o posix-cpu-timers.o
    obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
    obj-y += timeconv.o posix-clock.o alarmtimer.o
    
    obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD)         += clockevents.o
    obj-$(CONFIG_GENERIC_CLOCKEVENTS)               += tick-common.o
    ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST),y)
     obj-y                                          += tick-broadcast.o
     obj-$(CONFIG_TICK_ONESHOT)                     += tick-broadcast-hrtimer.o
    endif
    obj-$(CONFIG_GENERIC_SCHED_CLOCK)               += sched_clock.o
    obj-$(CONFIG_TICK_ONESHOT)                      += tick-oneshot.o
    obj-$(CONFIG_TICK_ONESHOT)                      += tick-sched.o
    obj-$(CONFIG_TIMER_STATS)                       += timer_stats.o
    obj-$(CONFIG_DEBUG_FS)                          += timekeeping_debug.o
    obj-$(CONFIG_TEST_UDELAY)                       += udelay_test.o
    
    static u64 notrace jiffy_sched_clock_read(void)
    {
            /*  
             * We don't need to use get_jiffies_64 on 32-bit arches here
             * because we register with BITS_PER_LONG
             */
            return (u64)(jiffies - INITIAL_JIFFIES);
    }
    
    static u64 __read_mostly (*read_sched_clock)(void) = jiffy_sched_clock_read;
    
    static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
    {
            return (cyc * mult) >> shift;
    }
    
    unsigned long long notrace sched_clock(void)
    {
            u64 epoch_ns;
            u64 epoch_cyc;
            u64 cyc;
            unsigned long seq;
    
            if (cd.suspended)
                    return cd.epoch_ns;
    
            do {
                    seq = raw_read_seqcount_begin(&cd.seq);
                    epoch_cyc = cd.epoch_cyc;
                    epoch_ns = cd.epoch_ns;
            } while (read_seqcount_retry(&cd.seq, seq));
    
            cyc = read_sched_clock();
            cyc = (cyc - epoch_cyc) & sched_clock_mask;
            return epoch_ns + cyc_to_ns(cyc, cd.mult, cd.shift);
    }
    

    In conclusion, in arm64, be default local_clock() provides monotonic cpu clock based on jiffies. Although its name suggests cpu clock in local(current) cpu, local_clock() directly call sched_clock() which is based on jiffies and doesn’t take into cpu into consideration.

    Linux kernel mailing list

    July 11, 2015

    Linux development is based on mailing lists. You may learn something by subscribing to some mailing lists. All you need to do is to send emails to Majordomo@vger.kernel.org. Majordomo is an automated system to response to the commands in the body of an email.

    Below are commands Majordomo could recognise:

  • lists: List all Linux mailing lists
  • subscribe example-mailing-list
  • Ex: subscribe linux-newbie

  • unsubscribe example-mailing-list
  • Ex: unsubscribe linux-newbie

    There are many mailing lists. linux-newbie is for new comers to ask questions. Make sure no one asked the same questions before. linux-janitor is for those who want to cleanup the kernel. You could participate in Kernel by sending your first patch to cleanup the code in kernel. But it’s not encouraged to do many cleanups since what is the most important are real problems. You could choose the subsystem you are interested in. I am interested in memory management subsystem, so I subscribe to mm-commits. linux-kernel mailing list is the most important one. If you subscribe to it, there will be lots of emails in your box. Think twice before sending emails to this one. There are many busy people reading this mailing list.

    Reference:

  • http://vger.kernel.org/vger-lists.html
  • http://www.tux.org/lkml/#s3-1
  • Build your own kernel in Ubuntu 14.04

    July 11, 2015

    Download Linux kernel source code
    $ repo init git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

    Copy from current Kernel configuration to working directory
    $ cp /boot/config-$(uname -r) .config

    Make configuration based on .config
    $ make olddefconfig

    Build bzImage and mouldes
    $ make

    Install kernel and modules
    $ sudo make install modules_install

    Reboot device
    $ sudo reboot


    %d bloggers like this: