Archive for the ‘compaction’ Category

patch discussion: mm/compaction.c: add an is_via_compact_memory() helper

December 21, 2015

This post discusses mm/compaction.c: add an is_via_compact_memory() helper.

merge at
git: kernel/git/mhocko/mm.git
branch: since-4.3

/proc/sys/vm/compact_memory
The core compaction function is compact_zone() which uses the argument compaction control to determine how to compact. There are three ways to call compact_zone(): allocate slow path, kswapd, or writing values to /proc/sys/vm/compact_memory.

If the order of compaction control is -1, then it implies that this compaction is triggered by /proc/sys/vm/compact_memory. Many code flows within compaction use if (cc.order == -1) to know if this compaction is from /proc/sys/vm/compact_memory.

This patch adds a helper function to explicitly check if a compaction is triggered by /proc/sys/vm/compact_memory. The implementation of this function uses (cc.order == -1) to know if the compaction is triggered by /proc/sys/vm/compact_memory.

1714 /* The written value is actually unused, all memory is compacted */
1715 int sysctl_compact_memory;
1716 
1717 /* This is the entry point for compacting all nodes via /proc/sys/vm */
1718 int sysctl_compaction_handler(struct ctl_table *table, int write,
1719                         void __user *buffer, size_t *length, loff_t *ppos)
1720 {
1721         if (write)
1722                 compact_nodes();
1723 
1724         return 0;
1725 }
1702 /* Compact all nodes in the system */
1703 static void compact_nodes(void)
1704 {
1705         int nid;
1706 
1707         /* Flush pending updates to the LRU lists */
1708         lru_add_drain_all();
1709 
1710         for_each_online_node(nid)
1711                 compact_node(nid);
1712 }
1691 static void compact_node(int nid)
1692 {
1693         struct compact_control cc = {
1694                 .order = -1,
1695                 .mode = MIGRATE_SYNC,
1696                 .ignore_skip_hint = true,
1697         };
1698 
1699         __compact_pgdat(NODE_DATA(nid), &cc);
1700 }
1638 /* Compact all zones within a node */
1639 static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
1640 {
1641         int zoneid;
1642         struct zone *zone;
1643 
1644         for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
1645 
1646                 zone = &pgdat->node_zones[zoneid];
1647                 if (!populated_zone(zone))
1648                         continue;
1649 
1650                 cc->nr_freepages = 0;
1651                 cc->nr_migratepages = 0;
1652                 cc->zone = zone;
1653                 INIT_LIST_HEAD(&cc->freepages);
1654                 INIT_LIST_HEAD(&cc->migratepages);
1655 
1656                 /*
1657                  * When called via /proc/sys/vm/compact_memory
1658                  * this makes sure we compact the whole zone regardless of
1659                  * cached scanner positions.
1660                  */
1661                 if (cc->order == -1)
1662                         __reset_isolation_suitable(zone);
1663 
1664                 if (cc->order == -1 || !compaction_deferred(zone, cc->order))
1665                         compact_zone(zone, cc);
1666 
1667                 if (cc->order > 0) {
1668                         if (zone_watermark_ok(zone, cc->order,
1669                                                 low_wmark_pages(zone), 0, 0))
1670                                 compaction_defer_reset(zone, cc->order, false);
1671                 }
1672 
1673                 VM_BUG_ON(!list_empty(&cc->freepages));
1674                 VM_BUG_ON(!list_empty(&cc->migratepages));
1675         }
1676 }

conclusion
This post discusses mm/compaction.c: add an is_via_compact_memory() helper. Before this patch, compaction code uses (cc.order == -1) to know if this compaction is triggered by /proc/sys/vm/compact_memory. After this patch is merged, compaction code could use helper function is_via_compact_memory() directly.

patch discussion: mm, thp: restructure thp avoidance of light synchronous migration

December 5, 2015

This post discusses mm, thp: restructure thp avoidance of light synchronous migration.

call flow of compaction

__alloc_pages_nodemask()
   -> __alloc_pages_slowpath()
      -> wake_all_kswapds()
      -> get_page_from_freelist()
      -> __alloc_pages_direct_compact()
         -> try_to_compact_pages()
            -> compact_zone_order()
               -> compact_zone()
                  -> compact_finished()
                  -> isolate_migratepages()
                  -> migrate_pages()
      -> __alloc_pages_direct_reclaim()
      -> should_alloc_retry()
      -> warn_alloc_failed()
      -> return page

compaction in v3.15
The first time of __alloc_pages_direct_compact() is async_migration. The other times of alloc_pages_direct_compact() is sync_migration.

2576         /*
2577          * Try direct compaction. The first pass is asynchronous. Subsequent
2578          * attempts after direct reclaim are synchronous
2579          */
2580         page = __alloc_pages_direct_compact(gfp_mask, order,
2581                                         zonelist, high_zoneidx,
2582                                         nodemask,
2583                                         alloc_flags, preferred_zone,
2584                                         migratetype, sync_migration,
2585                                         &contended_compaction,
2586                                         &deferred_compaction,
2587                                         &did_some_progress);
2588         if (page)
2589                 goto got_pg;
2590         sync_migration = true;

compaction in v3.16
The first time of __alloc_pages_direct_compact() is MIGRATE_ASYNC. The other times of alloc_pages_direct_compact() is MIGRATE_SYNC_LIGHT if __GFP_NO_KSWAPD is not set or the thread is not a kernel thread.

The __GFP_NO_KSWAPD here is to indicates that the caller is allocating transparent hugepages. It will affect some allocator who tries to allocate high order pages at first and doesn’t want to disturb the system such as ion allocator or kgsl allocator.

  • kernel: mm: gfp_mask and ion system heap allocation
  • kernel: mm: gfp_mask and kgsl allocator
  • The change here is due to mm, compaction: embed migration mode in compact_control and mm, thp: avoid excessive compaction latency during fault.

    2604         /*
    2605          * Try direct compaction. The first pass is asynchronous. Subsequent
    2606          * attempts after direct reclaim are synchronous
    2607          */
    2608         page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
    2609                                         high_zoneidx, nodemask, alloc_flags,
    2610                                         preferred_zone,
    2611                                         classzone_idx, migratetype,
    2612                                         migration_mode, &contended_compaction,
    2613                                         &deferred_compaction,
    2614                                         &did_some_progress);
    2615         if (page)
    2616                 goto got_pg;
    2617 
    2618         /*
    2619          * It can become very expensive to allocate transparent hugepages at
    2620          * fault, so use asynchronous memory compaction for THP unless it is
    2621          * khugepaged trying to collapse.
    2622          */
    2623         if (!(gfp_mask & __GFP_NO_KSWAPD) || (current->flags & PF_KTHREAD))
    2624                 migration_mode = MIGRATE_SYNC_LIGHT;
    2625 
    

    compaction in v3.17
    The first time of __alloc_pages_direct_compact() is MIGRATE_ASYNC. The other times of alloc_pages_direct_compact() is MIGRATE_SYNC_LIGHT if (gfp_mask & GFP_TRANSHUGE) != GFP_TRANSHUGE or the thread is not a kernel thread.

    GFP_TRANSHUGE here is to indicates that the caller is allocating transparent hugepages. This change could avoid affecting users who sets __GFP_NO_KSWAPD to avoid waking up kswapd to disturb the system.

    The change here is due to mm, thp: restructure thp avoidance of light synchronous migration.

    2627         /*
    2628          * Try direct compaction. The first pass is asynchronous. Subsequent
    2629          * attempts after direct reclaim are synchronous
    2630          */
    2631         page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
    2632                                         high_zoneidx, nodemask, alloc_flags,
    2633                                         preferred_zone,
    2634                                         classzone_idx, migratetype,
    2635                                         migration_mode, &contended_compaction,
    2636                                         &deferred_compaction,
    2637                                         &did_some_progress);
    2638         if (page)
    2639                 goto got_pg;
    2640 
    2641         /*
    2642          * If compaction is deferred for high-order allocations, it is because
    2643          * sync compaction recently failed. In this is the case and the caller
    2644          * requested a movable allocation that does not heavily disrupt the
    2645          * system then fail the allocation instead of entering direct reclaim.
    2646          */
    2647         if ((deferred_compaction || contended_compaction) &&
    2648                                                 (gfp_mask & __GFP_NO_KSWAPD))
    2649                 goto nopage;
    2650 
    2651         /*
    2652          * It can become very expensive to allocate transparent hugepages at
    2653          * fault, so use asynchronous memory compaction for THP unless it is
    2654          * khugepaged trying to collapse.
    2655          */
    2656         if ((gfp_mask & GFP_TRANSHUGE) != GFP_TRANSHUGE ||
    2657                                                 (current->flags & PF_KTHREAD))
    2658                 migration_mode = MIGRATE_SYNC_LIGHT;
    

    conclusion
    This post discusses how compaction changes synchronous migration conditions in v3.15, v3.16, and v3.17. These changes are due to two patches.

  • mm, thp: avoid excessive compaction latency during fault
  • mm, thp: restructure thp avoidance of light synchronous migration
  • patch discussion: mm/compaction: do not count migratepages when unnecessary

    December 1, 2015

    This post discusses mm/compaction: do not count migratepages when unnecessary.

    merge time
    v3.16

    what does update_nr_listpages() do
    What update_nr_listpages() does is to update cc->nr_migratepages and cc->cc->nr_freepages.

  • set cc->nr_migratepages as length of cc->migratepages
  • set cc->nr_freepages as length of cc->freepages
  • 801 /*
    802  * We cannot control nr_migratepages and nr_freepages fully when migration is
    803  * running as migrate_pages() has no knowledge of compact_control. When
    804  * migration is complete, we count the number of pages on the lists by hand.
    805  */
    806 static void update_nr_listpages(struct compact_control *cc)
    807 {
    808         int nr_migratepages = 0;
    809         int nr_freepages = 0;
    810         struct page *page;
    811 
    812         list_for_each_entry(page, &cc->migratepages, lru)
    813                 nr_migratepages++;
    814         list_for_each_entry(page, &cc->freepages, lru)
    815                 nr_freepages++;
    816 
    817         cc->nr_migratepages = nr_migratepages;
    818         cc->nr_freepages = nr_freepages;
    819 }
    

    compact_zone(), migrate_pages() and update_nr_listpages()
    compact_zone() calls migrate_pages() to migrate pages. But migrate_pages() has no knowledge of compaction control and couldn’t update cc->nr_migratepages and cc->nr_freepages. So compact_zone() needs to call update_nr_listpages() to update compaction control after calling migrate_pages().

    If tracepoint is enabled, then tracepoint needs to collect the correct data.

  • There is no need to update cc->nr_freepages
  • If migrate_pages() return positive number, then cc->nr_migratepages needs to be updated.
  • If migrate_pages() return negative error code, then cc->nr_migratepages needs to be updated.
  • If tracepoint is not enabled:

  • There is no need to update cc->nr_migratepages and cc->nr_freepages. Both lists will be cleared at the end of the compaction while loop in compact_zone().
  • 1011         while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
    1012                 unsigned long nr_migrate, nr_remaining;
    1013                 int err;
    1014 
    1015                 switch (isolate_migratepages(zone, cc)) {
    1016                 case ISOLATE_ABORT:
    1017                         ret = COMPACT_PARTIAL;
    1018                         putback_movable_pages(&cc->migratepages);
    1019                         cc->nr_migratepages = 0;
    1020                         goto out;
    1021                 case ISOLATE_NONE:
    1022                         continue;
    1023                 case ISOLATE_SUCCESS:
    1024                         ;
    1025                 }
    1026 
    1027                 nr_migrate = cc->nr_migratepages;
    1028                 err = migrate_pages(&cc->migratepages, compaction_alloc,
    1029                                 (unsigned long)cc,
    1030                                 cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC,
    1031                                 MR_COMPACTION);
    1032                 update_nr_listpages(cc);
    1033                 nr_remaining = cc->nr_migratepages;
    1034 
    1035                 trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
    1036                                                 nr_remaining);
    1037 
    1038                 /* Release isolated pages not migrated */
    1039                 if (err) {
    1040                         putback_movable_pages(&cc->migratepages);
    1041                         cc->nr_migratepages = 0;
    1042                         /*
    1043                          * migrate_pages() may return -ENOMEM when scanners meet
    1044                          * and we want compact_finished() to detect it
    1045                          */
    1046                         if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
    1047                                 ret = COMPACT_PARTIAL;
    1048                                 goto out;
    1049                         }
    1050                 }
    1051         }
    

    conclusion
    This post discusses mm/compaction: do not count migratepages when unnecessary. It discusses the reason why update_nr_listpages() has no effects while tracepoint is disabled. update_nr_listpages() is also remove at this patch since v3.16.

    patch discussion: mm, compaction: terminate async compaction when rescheduling

    November 30, 2015

    This patch discusses mm, compaction: terminate async compaction when rescheduling.

    merge time
    v3.16

    symptom
    In isolate_migratepages_range(), cond_resched() might reschedule and it make need_schedule() in should_release_lock() always return false. This will make some async compactions don’t abort as expected in mm: compaction: minimise the time IRQs are disabled while isolating pages for migration.

    456 unsigned long
    457 isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
    458                 unsigned long low_pfn, unsigned long end_pfn, bool unevictable)
    459 {
    460         unsigned long last_pageblock_nr = 0, pageblock_nr;
    461         unsigned long nr_scanned = 0, nr_isolated = 0;
    462         struct list_head *migratelist = &cc->migratepages;
    463         struct lruvec *lruvec;
    464         unsigned long flags;
    465         bool locked = false;
    466         struct page *page = NULL, *valid_page = NULL;
    467         bool skipped_async_unsuitable = false;
    468         const isolate_mode_t mode = (!cc->sync ? ISOLATE_ASYNC_MIGRATE : 0) |
    469                                     (unevictable ? ISOLATE_UNEVICTABLE : 0);
    470 
    471         /*
    472          * Ensure that there are not too many pages isolated from the LRU
    473          * list by either parallel reclaimers or compaction. If there are,
    474          * delay for some time until fewer pages are isolated
    475          */
    476         while (unlikely(too_many_isolated(zone))) {
    477                 /* async migration should just abort */
    478                 if (!cc->sync)
    479                         return 0;
    480 
    481                 congestion_wait(BLK_RW_ASYNC, HZ/10);
    482 
    483                 if (fatal_signal_pending(current))
    484                         return 0;
    485         }
    486 
    487         /* Time to isolate some pages for migration */
    488         cond_resched();
    489         for (; low_pfn < end_pfn; low_pfn++) {
    490                 /* give a chance to irqs before checking need_resched() */
    491                 if (locked && !(low_pfn % SWAP_CLUSTER_MAX)) {
    492                         if (should_release_lock(&zone->lru_lock)) {
    493                                 spin_unlock_irqrestore(&zone->lru_lock, flags);
    494                                 locked = false;
    495                         }
    496                 }
    
    174 static inline bool should_release_lock(spinlock_t *lock)
    175 {
    176         return need_resched() || spin_is_contended(lock);
    177 }
    

    effects of this patch

  • If cond_resched() returns true and this compaction is asynchronous, then isolate_migratepages_range() will return ISOLATE_ABORT = 0.
  • compact_zone() will return COMPACT_PARTIAL = 2.
  • compact_zone_order() will return COMPACT_PARTIAL = 2.
  • try_to_compact_pages() will return COMPACT_PARTIAL = 2. It will return 2 if no other zones in the zonelist return higher value, such as COMPACT_COMPLETE = 3.
  • __alloc_pages_direct_compact() set *did_some_progress as 2. If there still exist no pages in freelist, then it will not call defer_compaction() since this compaction is asynchronous.
  • Since *did_some_progress > 0, oom-killer will not be triggered in this round of rebalance
  • 
    diff --git a/mm/compaction.c b/mm/compaction.c
    index 217a6ad..56331f5 100644
    --- a/mm/compaction.c
    +++ b/mm/compaction.c
    @@ -494,8 +494,13 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
     			return 0;
     	}
     
    +	if (cond_resched()) {
    +		/* Async terminates prematurely on need_resched() */
    +		if (cc->mode == MIGRATE_ASYNC)
    +			return 0;
    +	}
    +
     	/* Time to isolate some pages for migration */
    -	cond_resched();
     	for (; low_pfn < end_pfn; low_pfn++) {
     		/* give a chance to irqs before checking need_resched() */
     		if (locked && !(low_pfn % SWAP_CLUSTER_MAX)) {
    

    conclusion
    This patch discusses how mm, compaction: terminate asynchronous compaction when rescheduling fixes the condition in which asynchronous compaction doesn’t abort as expected. It also shows compaction behaviours while this asynchronous compaction happens.

    patch discussion: mm, thp: avoid excessive compaction latency during fault

    November 30, 2015

    This post is to discuss patch mm, thp: avoid excessive compaction latency during fault.

    merge time
    v3.16

    effects of this patch

    page allocations from a kernel thread

  • the first time of compaction in allocation slowpth is MIGRATE_ASYNC
  • the other times of compaction in allocation slowpth is MIGRATE_SYNC_LIGHT
  • normal page allocations from a user space thread

  • the first time of compaction in allocation slowpth is MIGRATE_ASYNC
  • the other times of compaction in allocation slowpth is MIGRATE_SYNC_LIGHT
  • transparent huge page allocations from a user space thread

  • each time of compaction in allocation slowpth is MIGRATE_ASYNC
  • diff --git a/mm/page_alloc.c b/mm/page_alloc.c
    index afb29da..d88d675 100644
    --- a/mm/page_alloc.c
    +++ b/mm/page_alloc.c
    @@ -2575,7 +2575,14 @@ rebalance:
     					&did_some_progress);
     	if (page)
     		goto got_pg;
    -	migration_mode = MIGRATE_SYNC_LIGHT;
    +
    +	/*
    +	 * It can become very expensive to allocate transparent hugepages at
    +	 * fault, so use asynchronous memory compaction for THP unless it is
    +	 * khugepaged trying to collapse.
    +	 */
    +	if (!(gfp_mask & __GFP_NO_KSWAPD) || (current->flags & PF_KTHREAD))
    +		migration_mode = MIGRATE_SYNC_LIGHT;
     
     	/*
     	 * If compaction is deferred for high-order allocations, it is because
    

    __GFP_NO_KSWAPD and huge page allocation
    get_huge_zero_page() which returns a huge page calls alloc_pages() with GFP_TRANSHUGE in gfp_mask set. GFP_TRANSHUGE includes __GFP_NO_KSWAPD. This patch tests __GFP_NO_KSWAPD to know this allocation might be for a huge page allocation.

    kernel: mm: gfp_mask and ion system heap allocation and kernel: mm: gfp_mask and kgsl allocator show ion system heap and kgsl allocators allocate high order pages with __GFP_NO_KSWAPD in gfp_mask set. But since __GFP_NORETRY in gfp_mask is also set, these allocations will not try the second time of compaction in slowpath. So the patch has no effects to these allocations.

    This patch might affect some allocations which also set __GFP_NO_KSWAPD in gfp_mask.
    Since __GFP_NO_KSWAPD implies that the callers don’t want to disturb the system by waking up kswapd, it’s reasonable not to synchronous compact the system, also.

    123 #define GFP_TRANSHUGE   (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
    124                          __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
    125                          __GFP_NO_KSWAPD)
    
    176 struct page *get_huge_zero_page(void)
    177 {
    178         struct page *zero_page;
    179 retry:
    180         if (likely(atomic_inc_not_zero(&huge_zero_refcount)))
    181                 return READ_ONCE(huge_zero_page);
    182 
    183         zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE,
    184                         HPAGE_PMD_ORDER);
    185         if (!zero_page) {
    186                 count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED);
    187                 return NULL;
    188         }
    189         count_vm_event(THP_ZERO_PAGE_ALLOC);
    190         preempt_disable();
    191         if (cmpxchg(&huge_zero_page, NULL, zero_page)) {
    192                 preempt_enable();
    193                 __free_pages(zero_page, compound_order(zero_page));
    194                 goto retry;
    195         }
    196 
    197         /* We take additional reference here. It will be put back by shrinker */
    198         atomic_set(&huge_zero_refcount, 2);
    199         preempt_enable();
    200         return READ_ONCE(huge_zero_page);
    201 }
    

    conclusion
    This post is to discuss how mm, thp: avoid excessive compaction latency during fault decrease compaction efforts while __GFP_NO_KSWAPD is set.

    patch discussion: mm, compaction: embed migration mode in compact_control

    November 30, 2015

    This post is to discuss mm, compaction: embed migration mode in compact_control.

    merge time
    v3.16

    effects of this patch
    compact control’s field sync is replaced by enum migrate_mode mode.

    how callers set up compaction control’s migrate_mode

  • compaction though /proc/sys/vm/compact_memory set mode as MIGRATE_SYNC
  • compaction from kswapd set mode as MIGRATE_ASYNC
  • the first compaction from allocation slowpatch set mode as MIGRATE_ASYNC
  • the other compaction from allocation slowpatch set mode as MIGRATE_SYNC_LIGHT
  • zone’s cached migrate pfn and compaction control’s migrate_mode

  • zone->compact_cached_migrate_pfn[0] is for MIGRATE_ASYNC.
  • zone->compact_cached_migrate_pfn[0] is for MIGRATE_SYNC_LIGHT and MIGRATE_SYNC.
  • diff --git a/mm/internal.h b/mm/internal.h
    index 6ee580d..a25424a 100644
    --- a/mm/internal.h
    +++ b/mm/internal.h
    @@ -134,7 +134,7 @@ struct compact_control {
     	unsigned long nr_migratepages;	/* Number of pages to migrate */
     	unsigned long free_pfn;		/* isolate_freepages search base */
     	unsigned long migrate_pfn;	/* isolate_migratepages search base */
    -	bool sync;			/* Synchronous migration */
    +	enum migrate_mode mode;		/* Async or sync migration mode */
     	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
     	bool finished_update_free;	/* True when the zone cached pfns are
     					 * no longer being updated
    

    conclusion
    This post is to discuss the effects of mm, compaction: embed migration mode in compact_control.

    patch discussion: mm, compaction: add per-zone migration pfn cache for async compaction

    November 30, 2015

    This post is to discuss patch mm, compaction: add per-zone migration pfn cache for async compaction.

    merge time
    v3.16

    symptom in v3.15
    If migrate scanner of an async compaction finds a pageblock whose type is not CMA or Movable, then it skips the pageblock. But it doesn’t update pageblock skip flag and zone’s compact_cached_migrate_pfn. The next async compaction will also skip these pageblocks again. It’s a waste of cpu resources, especially in a system with large memory.

    528                 /* If isolation recently failed, do not retry */
    529                 pageblock_nr = low_pfn >> pageblock_order;
    530                 if (last_pageblock_nr != pageblock_nr) {
    531                         int mt;
    532 
    533                         last_pageblock_nr = pageblock_nr;
    534                         if (!isolation_suitable(cc, page))
    535                                 goto next_pageblock;
    536 
    537                         /*
    538                          * For async migration, also only scan in MOVABLE
    539                          * blocks. Async migration is optimistic to see if
    540                          * the minimum amount of work satisfies the allocation
    541                          */
    542                         mt = get_pageblock_migratetype(page);
    543                         if (!cc->sync && !migrate_async_suitable(mt)) {
    544                                 cc->finished_update_migrate = true;
    545                                 skipped_async_unsuitable = true;
    546                                 goto next_pageblock;
    547                         }
    548                 }
    

    how does the patch improve this
    Each zone has two cached migrate pfn. One is for async compaction, and the other is for sync compaction.

  • If async compaction’s migrate scanner skips a pageblock whose pageblock is not CMA or Movable, then it will update zone->compact_cached_migrate_pfn[0], but it still will not update pageblock’s skip flag.
  • The following async could take advantage of this updated zone->compact_cached_migrate_pfn[0] to prevent migrate scanner from scanning these pages again.
  • I am not sure why this patch doesn’t set pageblock’s skip flag for async copmaction. In v4.3, both zone’s cached async migrate pfn and pageblock’s skip are updated.
  • diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
    index ae693e1..10a96ee 100644
    --- a/include/linux/mmzone.h
    +++ b/include/linux/mmzone.h
    @@ -360,9 +360,10 @@ struct zone {
     	/* Set to true when the PG_migrate_skip bits should be cleared */
     	bool			compact_blockskip_flush;
     
    -	/* pfns where compaction scanners should start */
    +	/* pfn where compaction free scanner should start */
     	unsigned long		compact_cached_free_pfn;
    -	unsigned long		compact_cached_migrate_pfn;
    +	/* pfn where async and sync compaction migration scanner should start */
    +	unsigned long		compact_cached_migrate_pfn[2];
     #endif
     #ifdef CONFIG_MEMORY_HOTPLUG
     	/* see spanned/present_pages for more description */
    

    conclusion
    This post is to discuss how mm, compaction: add per-zone migration pfn cache for async compaction improve async copmaction’s efficiency by async compact_cached_migrate_pfn for each zone.

    patch discussion: mm/compaction: disallow high-order page for migration target

    November 29, 2015

    The post is to discuss mm/compaction: disallow high-order page for migration target

    merge time
    v3.15

    effect of this patch
    After this patch, the pages which could be isolated by free scanner must be buddy order-0 ones of CMA or Movable pageblock.

    diff --git a/mm/compaction.c b/mm/compaction.c
    index b6ab771..9a03fdb 100644
    --- a/mm/compaction.c
    +++ b/mm/compaction.c
    @@ -217,21 +217,12 @@ static inline bool compact_trylock_irqsave(spinlock_t *lock,
     /* Returns true if the page is within a block suitable for migration to */
     static bool suitable_migration_target(struct page *page)
     {
    -	int migratetype = get_pageblock_migratetype(page);
    -
    -	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
    -	if (migratetype == MIGRATE_RESERVE)
    -		return false;
    -
    -	if (is_migrate_isolate(migratetype))
    -		return false;
    -
    -	/* If the page is a large free page, then allow migration */
    +	/* If the page is a large free page, then disallow migration */
     	if (PageBuddy(page) && page_order(page) >= pageblock_order)
    -		return true;
    +		return false;
     
     	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
    -	if (migrate_async_suitable(migratetype))
    +	if (migrate_async_suitable(get_pageblock_migratetype(page)))
     		return true;
     
     	/* Otherwise skip the block */
    

    compaction and migration type
    From v3.15 to v4.3, compaction has no effect under condition in which all free pages are Unmovable and Reclaimabl migrate type.

    ------ PAGETYPEINFO (/proc/pagetypeinfo) ------
    Page block order: 10
    Pages per block:  1024  
    
    Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10    
    Node    0, zone      DMA, type    Unmovable   1533   3972    896      0      0      0      0      0      0      0      0     
    Node    0, zone      DMA, type  Reclaimable    618   3006      6      0      0      0      0      0      0      0      0     
    Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      0     
    Node    0, zone      DMA, type      Reserve      0      0      3      5      0      0      0      0      0      0      0     
    Node    0, zone      DMA, type          CMA      0      0      0      0      0      0      0      0      0      0      0     
    Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0     
    
    Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
    Node 0, zone      DMA          215            8          396            2          106            0
    

    conclusion
    This post discusses effect and this patch and compaction limitation related to migration type from v3.15 to v4.3.

    patch discussion: mm, compaction: return failed migration target pages back to freelist

    November 29, 2015

    This post is to discuss patch mm, compaction: return failed migration target pages back to freelist

    merge time
    v3.16

    migrate_pages() and unmap_and_move() in v3.15
    In v3.15, migrate_pages() and unmap_and_move() have an get_new_page callback to let free scanner isolate free pages. If page migration fails, these isolated free pages are returned to buddy system by putback_lru_page(newpage). It’s a waste of resources, since these pages could be used in the subsequent migrations.

    941 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
    942                         struct page *page, int force, enum migrate_mode mode)
    943 {
    944         int rc = 0;
    945         int *result = NULL;
    946         struct page *newpage = get_new_page(page, private, &result);
    947 
    948         if (!newpage)
    949                 return -ENOMEM;
    950 
    951         if (page_count(page) == 1) {
    952                 /* page was freed from under us. So we are done. */
    953                 goto out;
    954         }
    955 
    956         if (unlikely(PageTransHuge(page)))
    957                 if (unlikely(split_huge_page(page)))
    958                         goto out;
    959 
    960         rc = __unmap_and_move(page, newpage, force, mode);
    961 
    962         if (unlikely(rc == MIGRATEPAGE_BALLOON_SUCCESS)) {
    963                 /*
    964                  * A ballooned page has been migrated already.
    965                  * Now, it's the time to wrap-up counters,
    966                  * handle the page back to Buddy and return.
    967                  */
    968                 dec_zone_page_state(page, NR_ISOLATED_ANON +
    969                                     page_is_file_cache(page));
    970                 balloon_page_free(page);
    971                 return MIGRATEPAGE_SUCCESS;
    972         }
    973 out:
    974         if (rc != -EAGAIN) {
    975                 /*
    976                  * A page that has been migrated has all references
    977                  * removed and will be freed. A page that has not been
    978                  * migrated will have kepts its references and be
    979                  * restored.
    980                  */
    981                 list_del(&page->lru);
    982                 dec_zone_page_state(page, NR_ISOLATED_ANON +
    983                                 page_is_file_cache(page));
    984                 putback_lru_page(page);
    985         }
    986         /*
    987          * Move the new page to the LRU. If migration was not successful
    988          * then this will free the page.
    989          */
    990         putback_lru_page(newpage);
    991         if (result) {
    992                 if (rc)
    993                         *result = rc;
    994                 else
    995                         *result = page_to_nid(newpage);
    996         }
    997         return rc;
    998 }
    

    migrate_pages() and unmap_and_move() in v3.16
    In v3.16, migrate_pages() and unmap_and_move() support put_new_page callback argument as well as get_new_page. If migration fails, unmap_and_move() could call put_new_page callback to release these isolated free pages.

    This extension is done by mm, migration: add destination page freeing callback

    939 static int unmap_and_move(new_page_t get_new_page, free_page_t put_new_page,
    940                         unsigned long private, struct page *page, int force,
    941                         enum migrate_mode mode)
    942 {
    943         int rc = 0;
    944         int *result = NULL;
    945         struct page *newpage = get_new_page(page, private, &result);
    946 
    947         if (!newpage)
    948                 return -ENOMEM;
    949 
    950         if (page_count(page) == 1) {
    951                 /* page was freed from under us. So we are done. */
    952                 goto out;
    953         }
    954 
    955         if (unlikely(PageTransHuge(page)))
    956                 if (unlikely(split_huge_page(page)))
    957                         goto out;
    958 
    959         rc = __unmap_and_move(page, newpage, force, mode);
    960 
    961         if (unlikely(rc == MIGRATEPAGE_BALLOON_SUCCESS)) {
    962                 /*
    963                  * A ballooned page has been migrated already.
    964                  * Now, it's the time to wrap-up counters,
    965                  * handle the page back to Buddy and return.
    966                  */
    967                 dec_zone_page_state(page, NR_ISOLATED_ANON +
    968                                     page_is_file_cache(page));
    969                 balloon_page_free(page);
    970                 return MIGRATEPAGE_SUCCESS;
    971         }
    972 out:
    973         if (rc != -EAGAIN) {
    974                 /*
    975                  * A page that has been migrated has all references
    976                  * removed and will be freed. A page that has not been
    977                  * migrated will have kepts its references and be
    978                  * restored.
    979                  */
    980                 list_del(&page->lru);
    981                 dec_zone_page_state(page, NR_ISOLATED_ANON +
    982                                 page_is_file_cache(page));
    983                 putback_lru_page(page);
    984         }
    985 
    986         /*
    987          * If migration was not successful and there's a freeing callback, use
    988          * it.  Otherwise, putback_lru_page() will drop the reference grabbed
    989          * during isolation.
    990          */
    991         if (rc != MIGRATEPAGE_SUCCESS && put_new_page) {
    992                 ClearPageSwapBacked(newpage);
    993                 put_new_page(newpage, private);
    994         } else
    995                 putback_lru_page(newpage);
    996 
    997         if (result) {
    998                 if (rc)
    999                         *result = rc;
    1000                 else
    1001                         *result = page_to_nid(newpage);
    1002         }
    1003         return rc;
    1004 }
    

    effect of this patch
    This patch adds compaction_free callback. compact_zone() calls migrate_pages() with compaction_free() as put_new_page argument. This could put unused isolated free pages into compaction control’s freelist and increase compaction efficiency.

    
    diff --git a/include/linux/migrate.h b/include/linux/migrate.h
    index 84a31ad..a2901c4 100644
    --- a/include/linux/migrate.h
    +++ b/include/linux/migrate.h
    +static void compaction_free(struct page *page, unsigned long data)
    +{
    +	struct compact_control *cc = (struct compact_control *)data;
    +
    +	list_add(&page->lru, &cc->freepages);
    +	cc->nr_freepages++;
    +}
    
     /* possible outcome of isolate_migratepages */
    @@ -1016,8 +1025,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
     		}
     
     		nr_migrate = cc->nr_migratepages;
    -		err = migrate_pages(&cc->migratepages, compaction_alloc, NULL,
    -				(unsigned long)cc,
    +		err = migrate_pages(&cc->migratepages, compaction_alloc,
    +				compaction_free, (unsigned long)cc,
     				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC,
     				MR_COMPACTION);
     		update_nr_listpages(cc);
    

    conclusion
    This post shows the how migrate_pages() and unmap_and_move() evolved from v3.15 to v3.16. It also discusses how this patch help increase compaction efficiency.

    patch discussion: mm, compaction: ignore pageblock skip when manually invoking compaction

    November 28, 2015

    This post is to discuss kernel patch mm, compaction: ignore pageblock skip when manually invoking compaction

    merge time
    v3.15

    what the patch does
    This patch sets ignore_skip_hint of compaction while compaction is triggered through /proc/sys/vm/compact_memory. This could enforce this compaction isolates all pageblocks.

    diff --git a/mm/compaction.c b/mm/compaction.c
    index 9185775..37b3799 100644
    --- a/mm/compaction.c
    +++ b/mm/compaction.c
    @@ -1186,6 +1186,7 @@ static void compact_node(int nid)
     	struct compact_control cc = {
     		.order = -1,
     		.sync = true,
    +		.ignore_skip_hint = true,
     	};
     
     	__compact_pgdat(NODE_DATA(nid), &cc);
    

    what is the effect of compaction.ignore_skip_hint

  • compact_zone() calls isolate_migratepages() to isolate used pages starting at cc->migrate_pfn
  • isolate_migratepages_range() scans pages one by one. If it scans a pages from a new pageblock, it calls isolation_suitable() to determine the whole pageblock should be skipped or not.
  • isolation_suitable() returns true if skip flag of this pageblock is set. cc->ignore_skip_hint could enforce it return false.
  • isolate_freepages() also use cc->ignore_skip_hint to skip a pageblock.
  • 456 unsigned long
    457 isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
    458                 unsigned long low_pfn, unsigned long end_pfn, bool unevictable)
    459 {
    460         unsigned long last_pageblock_nr = 0, pageblock_nr;
    461         unsigned long nr_scanned = 0, nr_isolated = 0;
    462         struct list_head *migratelist = &cc->migratepages;
    463         struct lruvec *lruvec;
    464         unsigned long flags;
    465         bool locked = false;
    466         struct page *page = NULL, *valid_page = NULL;
    467         bool skipped_async_unsuitable = false;
    468         const isolate_mode_t mode = (!cc->sync ? ISOLATE_ASYNC_MIGRATE : 0) |
    469                                     (unevictable ? ISOLATE_UNEVICTABLE : 0);
    470 
    471         /*
    472          * Ensure that there are not too many pages isolated from the LRU
    473          * list by either parallel reclaimers or compaction. If there are,
    474          * delay for some time until fewer pages are isolated
    475          */
    476         while (unlikely(too_many_isolated(zone))) {
    477                 /* async migration should just abort */
    478                 if (!cc->sync)
    479                         return 0;
    480 
    481                 congestion_wait(BLK_RW_ASYNC, HZ/10);
    482 
    483                 if (fatal_signal_pending(current))
    484                         return 0;
    485         }
    486 
    487         /* Time to isolate some pages for migration */
    488         cond_resched();
    489         for (; low_pfn < end_pfn; low_pfn++) {
    490                 /* give a chance to irqs before checking need_resched() */
    491                 if (locked && !(low_pfn % SWAP_CLUSTER_MAX)) {
    492                         if (should_release_lock(&zone->lru_lock)) {
    493                                 spin_unlock_irqrestore(&zone->lru_lock, flags);
    494                                 locked = false;
    495                         }
    496                 }
    497 
    498                 /*
    499                  * migrate_pfn does not necessarily start aligned to a
    500                  * pageblock. Ensure that pfn_valid is called when moving
    501                  * into a new MAX_ORDER_NR_PAGES range in case of large
    502                  * memory holes within the zone
    503                  */
    504                 if ((low_pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
    505                         if (!pfn_valid(low_pfn)) {
    506                                 low_pfn += MAX_ORDER_NR_PAGES - 1;
    507                                 continue;
    508                         }
    509                 }
    510 
    511                 if (!pfn_valid_within(low_pfn))
    512                         continue;
    513                 nr_scanned++;
    514 
    515                 /*
    516                  * Get the page and ensure the page is within the same zone.
    517                  * See the comment in isolate_freepages about overlapping
    518                  * nodes. It is deliberate that the new zone lock is not taken
    519                  * as memory compaction should not move pages between nodes.
    520                  */
    521                 page = pfn_to_page(low_pfn);
    522                 if (page_zone(page) != zone)
    523                         continue;
    524 
    525                 if (!valid_page)
    526                         valid_page = page;
    527 
    528                 /* If isolation recently failed, do not retry */
    529                 pageblock_nr = low_pfn >> pageblock_order;
    530                 if (last_pageblock_nr != pageblock_nr) {
    531                         int mt;
    532 
    533                         last_pageblock_nr = pageblock_nr;
    534                         if (!isolation_suitable(cc, page))
    535                                 goto next_pageblock;
    536 
    537                         /*
    538                          * For async migration, also only scan in MOVABLE
    539                          * blocks. Async migration is optimistic to see if
    540                          * the minimum amount of work satisfies the allocation
    541                          */
    542                         mt = get_pageblock_migratetype(page);
    543                         if (!cc->sync && !migrate_async_suitable(mt)) {
    544                                 cc->finished_update_migrate = true;
    545                                 skipped_async_unsuitable = true;
    546                                 goto next_pageblock;
    547                         }
    548                 }
    549 
    550                 /*
    551                  * Skip if free. page_order cannot be used without zone->lock
    552                  * as nothing prevents parallel allocations or buddy merging.
    553                  */
    554                 if (PageBuddy(page))
    555                         continue;
    556 
    557                 /*
    558                  * Check may be lockless but that's ok as we recheck later.
    559                  * It's possible to migrate LRU pages and balloon pages
    560                  * Skip any other type of page
    561                  */
    562                 if (!PageLRU(page)) {
    563                         if (unlikely(balloon_page_movable(page))) {
    564                                 if (locked && balloon_page_isolate(page)) {
    565                                         /* Successfully isolated */
    566                                         goto isolate_success;
    567                                 }
    568                         }
    569                         continue;
    570                 }
    571 
    572                 /*
    573                  * PageLRU is set. lru_lock normally excludes isolation
    574                  * splitting and collapsing (collapsing has already happened
    575                  * if PageLRU is set) but the lock is not necessarily taken
    576                  * here and it is wasteful to take it just to check transhuge.
    577                  * Check TransHuge without lock and skip the whole pageblock if
    578                  * it's either a transhuge or hugetlbfs page, as calling
    579                  * compound_order() without preventing THP from splitting the
    580                  * page underneath us may return surprising results.
    581                  */
    582                 if (PageTransHuge(page)) {
    583                         if (!locked)
    584                                 goto next_pageblock;
    585                         low_pfn += (1 << compound_order(page)) - 1;
    586                         continue;
    587                 }
    588 
    589                 /*
    590                  * Migration will fail if an anonymous page is pinned in memory,
    591                  * so avoid taking lru_lock and isolating it unnecessarily in an
    592                  * admittedly racy check.
    593                  */
    594                 if (!page_mapping(page) &&
    595                     page_count(page) > page_mapcount(page))
    596                         continue;
    597 
    598                 /* Check if it is ok to still hold the lock */
    599                 locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
    600                                                                 locked, cc);
    601                 if (!locked || fatal_signal_pending(current))
    602                         break;
    603 
    604                 /* Recheck PageLRU and PageTransHuge under lock */
    605                 if (!PageLRU(page))
    606                         continue;
    607                 if (PageTransHuge(page)) {
    608                         low_pfn += (1 << compound_order(page)) - 1;
    609                         continue;
    610                 }
    611 
    612                 lruvec = mem_cgroup_page_lruvec(page, zone);
    613 
    614                 /* Try isolate the page */
    615                 if (__isolate_lru_page(page, mode) != 0)
    616                         continue;
    617 
    618                 VM_BUG_ON_PAGE(PageTransCompound(page), page);
    619 
    620                 /* Successfully isolated */
    621                 del_page_from_lru_list(page, lruvec, page_lru(page));
    622 
    623 isolate_success:
    624                 cc->finished_update_migrate = true;
    625                 list_add(&page->lru, migratelist);
    626                 cc->nr_migratepages++;
    627                 nr_isolated++;
    628 
    629                 /* Avoid isolating too much */
    630                 if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
    631                         ++low_pfn;
    632                         break;
    633                 }
    634 
    635                 continue;
    636 
    637 next_pageblock:
    638                 low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
    639         }
    640 
    641         acct_isolated(zone, locked, cc);
    642 
    643         if (locked)
    644                 spin_unlock_irqrestore(&zone->lru_lock, flags);
    645 
    646         /*
    647          * Update the pageblock-skip information and cached scanner pfn,
    648          * if the whole pageblock was scanned without isolating any page.
    649          * This is not done when pageblock was skipped due to being unsuitable
    650          * for async compaction, so that eventual sync compaction can try.
    651          */
    652         if (low_pfn == end_pfn && !skipped_async_unsuitable)
    653                 update_pageblock_skip(cc, valid_page, nr_isolated, true);
    654 
    655         trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
    656 
    657         count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
    658         if (nr_isolated)
    659                 count_compact_events(COMPACTISOLATED, nr_isolated);
    660 
    661         return low_pfn;
    662 }
    
     72 static inline bool isolation_suitable(struct compact_control *cc,
     73                                         struct page *page)
     74 {
     75         if (cc->ignore_skip_hint)
     76                 return true;
     77 
     78         return !get_pageblock_skip(page);
     79 }
    

    who also set ignore_skip_hint
    CMA allocator calls alloc_contig_range() to allocate contiguous pages. It also set ignore_skip_hint of compaction control as true. This is due to the fact that CMA shouldn’t skip scanning pages.

    6247 int alloc_contig_range(unsigned long start, unsigned long end,
    6248                        unsigned migratetype)
    6249 {
    6250         unsigned long outer_start, outer_end;
    6251         int ret = 0, order;
    6252 
    6253         struct compact_control cc = {
    6254                 .nr_migratepages = 0,
    6255                 .order = -1,
    6256                 .zone = page_zone(pfn_to_page(start)),
    6257                 .sync = true,
    6258                 .ignore_skip_hint = true,
    6259         };
    

    conclusion
    ignore_skip_hint of compaction control is set as true in two conditions.

  • compact through /proc/sys/vm/compact_memory
  • CMA allocator calls alloc_contig_range()

  • %d bloggers like this: