Archive for the ‘cma’ Category

kernel: mm: cma: allocation failure due to buddy system

August 12, 2015

This post is to discuss CMA(Contiguous Memory Allocator) allocation failure due to Buddy system. The post is similar to kernel: mm: cma: allocation failure due to ksm . The reference source code here is qualcomm msm kernel release 3.4.0.

CMA allocation code flow
dma_alloc_from_contiguous is the API of CMA allocation. It manages CMA heap with bitmap and leverages alloc_contig_range to allocate contiguous pages.

Two APIs could get pages from buddy system: alloc_pages and alloc_contig_range. The former is to allocate a page of requested order. The latter is to allocate a requested region.

start_isolate_page_range sets all pageblocks the requested region occupies to be isolate. It then moves all free pages in these pageblocks to isolate freelist. After this step, all pages freed from users return to isolate freelist and couldn’t be allocated by users.

__alloc_contig_migrate_range migrates all used page to other pageblocks. Since CMA pageblocks could only be fall backed from Movable pages allocation, it is assumed that all used pages in CMA pageblocks could be migrated. After these pages migrated and freed, they returns to isolate freelist because these pageblocks are already isolate.

test_pages_isolated checks if all pages in the requested region are free or isolated.

isolate_freepages_range splits all free pages into order-0 free pages. Then, it uses a for loop to alloc each order-0 page and kernel_map_pages each page.

undo_isolate_page_range restores all pageblocks’ migration type to CMA and therefore moves all free pages to CMA freelist.

cma_alloc_01

symptom: alloc_contig_range test_pages_isolated failed

<4>[80467.803487] c3  26510 alloc_contig_range test_pages_isolated(33a00, 33b00) failed 
<4>[80467.821804] c3  26510 alloc_contig_range test_pages_isolated(33b00, 33c00) failed 

The error log implies that test_pages_isolated returns false. Within alloc_contig_range, after the first two steps works successfully, all pages in the region are supposed to be free and in isolate freelist.

int alloc_contig_range(unsigned long start, unsigned long end,
                       unsigned migratetype)
{
......
        /* Make sure the range is really isolated. */
        if (test_pages_isolated(outer_start, end)) {
                pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
                       outer_start, end);
                ret = -EBUSY;
                goto done;
        }
......
} 

analysis
In kernel: mm: cma: allocation failure due to ksm , the second step, __alloc_contig_migrate_range, fails to migrate KSM pages. In this case, however, test_pages_isolated return false even the first two steps execute correctly.

The root cause of this symptom is that after the first two steps completed, PCP list and users return pages into freelist. Somehow, these pages doesn’t return to isolate freelist. The free pages not in isolate freelist could be allocated. In the third step, test_pages_isolated return false because some returned free pages are allocated again.

analysis: the pages in PCP list returns to freelist
PCP list provides order-0 free page caches. If the size of PCP list is below threshold, then it will supplement its free pages from freelist. While an order-0 free page is moved from freelist to PCP list, it will set page_private(page) as the migration type of this page’s freelist. If PCP list size is above threshold, then it will move superfluous pages back into freelist whose migration type is equal to page_private(page).

After the first two steps, if a page is moved from PCP list into freelist, it will move into CMA freelist. This makes test_pages_isolated return false and CMA allocation failure.

/*
 * Frees a number of pages from the PCP lists
 * Assumes all pages on list are in same zone, and of same order.
to *
 * If the zone was previously in an "all pages pinned" state then look to
 * see if this freeing clears that state.
 *
 * And clear the zone's pages_scanned counter, to hold off the "all pages are
 * pinned" detection logic.
 */
static void free_pcppages_bulk(struct zone *zone, int count,
                                        struct per_cpu_pages *pcp)
{
......
                do {
                        page = list_entry(list->prev, struct page, lru);
                        mt = get_pageblock_migratetype(page);
                        if (likely(mt != MIGRATE_ISOLATE))
                                mt = page_private(page);

                        /* must delete as __free_one_page list manipulates */
                        list_del(&page->lru);
                        /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
                        __free_one_page(page, zone, 0, mt);
                        trace_mm_page_pcpu_drain(page, 0, mt);
                        if (likely(mt != MIGRATE_ISOLATE)) {
                                free++;
                                if (is_migrate_cma(mt))
                                        cma_free++;
                        }
                } while (--to_free && --batch_free && !list_empty(list));

......
}

analysis: the allocated pages return to freelist
While a user is ready to free a page and return it back to CMA freelist, if CMA driver finishes the first two steps at the same time, then the freed page will return to CMA freelist rather than isolate freelist. It’s possible that the freed page in CMA freelist could be allocated, and the third step, test_pages_isolated, will return false and the CMA allocation fails.

static void free_one_page(struct zone *zone, struct page *page, int order,
                                int migratetype)
{
        spin_lock(&zone->lock);
        zone->pages_scanned = 0;

        __free_one_page(page, zone, order, migratetype);
        if (unlikely(migratetype != MIGRATE_ISOLATE))
                __mod_zone_freepage_state(zone, 1 << order, migratetype);
        spin_unlock(&zone->lock);
}

analysis: snowball effect: increase allocation failure rate
After the first two steps, almost all pages are free and in isolate freelist. If a page is returned to incorrect freelist, it is likely to merge with lots of buddy pages in isolate freelist and become a larger one which is in incorrect freelist. Within these free pages, if only one is allocated, then the third step will return false. CMA allocation fail rate increases when memory pressure is high because these free pages not in freelist will be allocated easily.

analysis: snowball effect: incorrect MemFree accounting
Within free pages, the ones in isolate freelist is not accounted into MemFree. If a page is returned into CMA freelist and merge with the other free pages in the isolate freelist, then at most 4MB of isolate free pages becomes free pages in CMA freelist. If these pages are allocated, the MemFree account might incorrectly increase 4MB after these pages are freed again. The MemFree accounting might be 100 MB more than correct value. In Android, it will make lowmemory killer stop killing applications and memory pressure high.

conclusion
I encountered this problem in qualcomm msm kernel release 3.4.0. The same root cause also causes MemFree accounting incorrect. I locally fixed this problem by making sure free pages are returned to correct freelist. The problems is formally solved at Linux 3.18.

  • mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  • mm/page_alloc: add freepage on isolate pageblock to correct buddy list
  • mm/page_alloc: move freepage counting logic to __free_one_page()
  • Advertisements

    kernel: mm: cma: allocation failure due to ksm

    August 11, 2015

    This post is to discuss CMA(Contiguous Memory Allocator) allocation failure due to KSM. The reference source code here is qualcomm msm kernel release 3.4.0.

    CMA allocation code flow
    dma_alloc_from_contiguous is the API of CMA allocation. It manages CMA heap with bitmap and leverages alloc_contig_range to allocate contiguous pages.

    Two APIs could get pages from buddy system: alloc_pages and alloc_contig_range. The former is to allocate a page of requested order. The latter is to allocate a requested region.

    start_isolate_page_range sets all pageblocks the requested region occupies to be isolate. It then moves all free pages in these pageblocks to isolate freelist. After this step, all pages freed from users return to isolate freelist and couldn’t be allocated by users.

    __alloc_contig_migrate_range migrates all used page to other pageblocks. Since CMA pageblocks could only be fall backed from Movable pages allocation, it is assumed that all used pages in CMA pageblocks could be migrated. After these pages migrated and freed, they returns to isolate freelist because these pageblocks are already isolate.

    test_pages_isolated checks if all pages in the requested region are free and in isolate freelist.

    isolate_freepages_range splits all free pages into order-0 free pages. Then, it uses a for loop to alloc each order-0 page and kernel_map_pages each page.

    undo_isolate_page_range restores all pageblocks’ migration type to CMA and therefore moves all free pages to CMA freelist.

    cma_alloc_01

    symptom: alloc_contig_range test_pages_isolated failed

    <4>[80467.803487] c3  26510 alloc_contig_range test_pages_isolated(33a00, 33b00) failed 
    <4>[80467.821804] c3  26510 alloc_contig_range test_pages_isolated(33b00, 33c00) failed 
    

    The error log implies that test_pages_isolated fails. Within alloc_contig_range, after the first two steps works successfully, all pages in the region are supposed to be free and in isolate freelist.

    int alloc_contig_range(unsigned long start, unsigned long end,
                           unsigned migratetype)
    {
    ......
            /* Make sure the range is really isolated. */
            if (test_pages_isolated(outer_start, end)) {
                    pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
                           outer_start, end);
                    ret = -EBUSY;
                    goto done;
            }
    ......
    } 
    

    analysis
    The second step, __alloc_contig_migrate_range, calls migrate_pages to migrate anonymous pages. If the anonymous page is KSM page, then __unmap_and_move will return -EBUSY.

    alloc_contig_range
    -> __alloc_contig_migrate_range
       -> migrate_pages
          -> unmap_and_move
             -> __unmap_and_move
    
    static int __unmap_and_move(struct page *page, struct page *newpage,
                            int force, bool offlining, enum migrate_mode mode)
    {
    ......
            /*   
             * Only memory hotplug's offline_pages() caller has locked out KSM,
             * and can safely migrate a KSM page.  The other cases have skipped
             * PageKsm along with PageReserved - but it is only now when we have
             * the page lock that we can be certain it will not go KSM beneath us
             * (KSM will not upgrade a page from PageAnon to PageKsm when it sees
             * its pagecount raised, but only here do we take the page lock which
             * serializes that).
             */
            if (PageKsm(page) && !offlining) {
                    rc = -EBUSY;
                    goto unlock;
            }
    ......
    } 
    

    However, migrate_pages will ignore this error and still return success. Maybe this design is due to the fact that KSM page doesn’t support migration in kernel 3.4.0. If migration failure is permanent for KSM page, then there is no need to return error code.

    int migrate_pages(struct list_head *from,
                    new_page_t get_new_page, unsigned long private, bool offlining,
                    enum migrate_mode mode)
    {
    ......
            for(pass = 0; pass < 10 && retry; pass++) {
                    retry = 0;
    
                    list_for_each_entry_safe(page, page2, from, lru) {
                            cond_resched();
    
                            rc = unmap_and_move(get_new_page, private,
                                                    page, pass > 2, offlining,
                                                    mode);
    
                            switch(rc) {
                            case -ENOMEM:
                                    goto out;
                            case -EAGAIN:
                                    retry++;
                                    trace_migrate_retry(retry);
                                    break;
                            case 0:
                                    nr_succeeded++;
                                    break;
                            default:
                                    /* Permanent failure */
                                    nr_failed++;
                                    break;
                            }
                    }
            }
            rc = 0;
    ......
    }
    

    conclusion
    In qualcomm msm kernel release 3.4.0, KSM migration is not supported. So CMA allocation fails easily if KSM is enabled. KSM migration is supported in Linux 3.10 .


    %d bloggers like this: