Archive for the ‘migration’ Category

patch discussion: mm, migrate: count pages failing all retries in vmstat and tracepoint

December 8, 2015

This patch discusses mm, migrate: count pages failing all retries in vmstat and tracepoint.

merge at
git: kernel/git/mhocko/mm.git
branch: since-4.3

what is the problem of migrate_pages() in v4.3
Let’s consider the following conditions to understand how migrate_pages() works.

Assume the input argument from is a linked list of pages whose size is SWAP_CLUSTER_MAX = 32. Then migrate_pages() will try to migrate these 32 pages into free pages isolated by free scanner.

Also I assume that if it fails to migrate a page due to -EAGAIN, then it will always fail to migrate this page due to -EAGAIN. This could make below examples easier to understand.

1. All 32 pages are migrated successfully.
   1.1 (nr_succeeded, nr_failed, retry, rc) = (32, 0, 0, 0).
   1.2 from list is empty while it returns.
   1.3 Return value is 0.
2. If 4 or 32 pages failed to be migrated, and the other 28 ones are migrated successfully.
   2.1 (nr_succeeded, nr_failed, retry, rc) = (28, 4, 0, 4).
   2.2 from list is empty while it returns.
   2.3 Return value is 4.
3. If the first 5 pages are migrated successfully, but the 6th pages couldn't be migrated successfully due to -EAGAIN.
   3.1 (nr_succeeded, nr_failed, retry, rc) = (5, 0, 1, 1).
   3.2 The size of from list is 27 while it returns.
   3.3 Return value is 1.
4. If the first 5 pages are migrated successfully, the 6th page is not migrated successfully, and the 7th page couldn't be migrated successfully due to -EAGAIN.
   4.1 (nr_succeeded, nr_failed, retry, rc) = (5, 1, 1, 2).
   4.2 The size of from list is 26 while it returns.
   4.3 Return value is 2.

In the 4th case, unmap_and_move() fails to migrate the 6th page. Then it also fails to migrate the 7th page due to -EAGAIN. The return value rc = 2 correctly indicate how many pages are not migrated successfully. rc = 2 is because rc = nr_failed + retry = 1 + 1 = 2. Only nr_failed = 1 page are accounted into /proc/sys/vm/pgmigrate_fail. But it’s correct to increase /proc/sys/vm/pgmigrate_fail by 2 in this case.

1099 /*
1100  * migrate_pages - migrate the pages specified in a list, to the free pages
1101  *                 supplied as the target for the page migration
1102  *
1103  * @from:               The list of pages to be migrated.
1104  * @get_new_page:       The function used to allocate free pages to be used
1105  *                      as the target of the page migration.
1106  * @put_new_page:       The function used to free target pages if migration
1107  *                      fails, or NULL if no special handling is necessary.
1108  * @private:            Private data to be passed on to get_new_page()
1109  * @mode:               The migration mode that specifies the constraints for
1110  *                      page migration, if any.
1111  * @reason:             The reason for page migration.
1112  *
1113  * The function returns after 10 attempts or if no pages are movable any more
1114  * because the list has become empty or no retryable pages exist any more.
1115  * The caller should call putback_lru_pages() to return pages to the LRU
1116  * or free list only if ret != 0.
1117  *
1118  * Returns the number of pages that were not migrated, or an error code.
1119  */
1120 int migrate_pages(struct list_head *from, new_page_t get_new_page,
1121                 free_page_t put_new_page, unsigned long private,
1122                 enum migrate_mode mode, int reason)
1123 {
1124         int retry = 1;
1125         int nr_failed = 0;
1126         int nr_succeeded = 0;
1127         int pass = 0;
1128         struct page *page;
1129         struct page *page2;
1130         int swapwrite = current->flags & PF_SWAPWRITE;
1131         int rc;
1132 
1133         if (!swapwrite)
1134                 current->flags |= PF_SWAPWRITE;
1135 
1136         for(pass = 0; pass < 10 && retry; pass++) {
1137                 retry = 0;
1138 
1139                 list_for_each_entry_safe(page, page2, from, lru) {
1140                         cond_resched();
1141 
1142                         if (PageHuge(page))
1143                                 rc = unmap_and_move_huge_page(get_new_page,
1144                                                 put_new_page, private, page,
1145                                                 pass > 2, mode);
1146                         else
1147                                 rc = unmap_and_move(get_new_page, put_new_page,
1148                                                 private, page, pass > 2, mode,
1149                                                 reason);
1150 
1151                         switch(rc) {
1152                         case -ENOMEM:
1153                                 goto out;
1154                         case -EAGAIN:
1155                                 retry++;
1156                                 break;
1157                         case MIGRATEPAGE_SUCCESS:
1158                                 nr_succeeded++;
1159                                 break;
1160                         default:
1161                                 /*
1162                                  * Permanent failure (-EBUSY, -ENOSYS, etc.):
1163                                  * unlike -EAGAIN case, the failed page is
1164                                  * removed from migration page list and not
1165                                  * retried in the next outer loop.
1166                                  */
1167                                 nr_failed++;
1168                                 break;
1169                         }
1170                 }
1171         }
1172         rc = nr_failed + retry;
1173 out:
1174         if (nr_succeeded)
1175                 count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
1176         if (nr_failed)
1177                 count_vm_events(PGMIGRATE_FAIL, nr_failed);
1178         trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
1179 
1180         if (!swapwrite)
1181                 current->flags &= ~PF_SWAPWRITE;
1182 
1183         return rc;
1184 }

how does this patch fix the problem in branch since-4.3
Let nr_failed += retry. This could make the 4th case as below. nr_failed = 2 and /proc/sys/vm/pgmigrate_fail is increased by 2.

 
4. If the first 5 pages are migrated successfully, the 6th page is not migrated successfully, and the 7th page couldn't be migrated successfully due to -EAGAIN.
   4.1 (nr_succeeded, nr_failed, retry, rc) = (5, 2, 1, 2).
   4.2 The size of from list is 26 while returns.
   4.3 Return value is 2.
diff --git a/mm/migrate.c b/mm/migrate.c
index 842ecd7..94961f4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1169,7 +1169,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 			}
 		}
 	}
-	rc = nr_failed + retry;
+	nr_failed += retry;
+	rc = nr_failed;
 out:
 	if (nr_succeeded)
 		count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);

conclusion
This patch discusses mm, migrate: count pages failing all retries in vmstat and tracepoint fix incorrect accounting of /proc/sys/vm/pgmigrate_fail while it fails to migrate a page due to -EAGAIN.

patch discussion: mm: fix direct reclaim writeback regression

December 1, 2015

This post discusses mm: fix direct reclaim writeback regression.

merge time
v3.16-rc7

what regression might happen
mm, compaction: return failed migration target pages back to freelist is merged at v3.16-rc1. This will reuse the isolated free pages by pushing them back to cc->freepages rather than buddy system’s PCP list or freelist.

So far, I guess below is the root cause.

  • Before this patch, free pages which are not migrated successfully are returned back to buddy system. If these pages’ flags are tainted during unsuccessful migration, these flags will be reset after returning to buddy system.
  • After this patch, free pages which are not migrated successfully are returned back to cc->freepages. Such kind of operation will not reset the tainted flags during unsuccessful migration. While other pages are successfully migrated to these free pages later, these tainted flags are still there and might cause bizarre problems.
  • what problem are caused by this regression
    Some reported that a file cache page’s PageSwapBacked(page) is true in some condition. It’s very weird.

      WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
      WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()
    Call Trace:
        xfs_vm_writepage+0x5ce/0x630 [xfs]
        shrink_page_list+0x8f9/0xb90
        shrink_inactive_list+0x253/0x510
        shrink_lruvec+0x563/0x6c0
        shrink_zone+0x3b/0x100
        shrink_zones+0x1f1/0x3c0
        try_to_free_pages+0x164/0x380
        __alloc_pages_nodemask+0x822/0xc90
        alloc_pages_vma+0xaf/0x1c0
        handle_mm_fault+0xa31/0xc50
    

    how does this patch fix the reported problem
    The migration’s regression might cause this issue because taint page flags might make PageSwapBacked(page) true. This patch fixes this page by forcing ClearPageBacked(page) before the reused free pages are pushed back to cc->freepages.

    does there exist better ways to fix this problem
    If my guess of the root cause is correct, there are two approaches.

  • fix the migration function which causes taint flags of pages.
  • reset the page flags to make these reused page same as new free pages.
  • I think the first way is more correct since unsuccessful migrations shouldn’t taint the flags of free pages. The second way is more easy to be implemented but might be inefficient. Both ways should be general enough to fix all bizarre problems caused by taint flags of pages during unsuccessful migrations.

    conclusion
    This post discusses how mm: fix direct reclaim writeback regression fixes a regression caused by new migration design. So far I guest it is related to taint flags of pages during migrations. If it is true, I think there might be two more general approaches to fix this kind of problems.

    kernel: mm: about migration in v3.15

    November 30, 2015

    The post is to discuss migration in v3.15.

    what is migration
    Migration could migrate pages to other free pages. For example, an anonymous page could be migrated by copying page structure contents, copying physical page contents, and updating the corresponding vma and page table.

    who are the users of migrate_pages()
    CMA allocator and compaction mechanism.

    how move_to_new_page() move pages
    Generally, move_to_new_page() could help migrate a page by calling page->mapping->a_ops->migratepage(). If a filesystem supports migratepage, then its page could be migrated by move_to_new_page().

    736 /*
    737  * Move a page to a newly allocated page
    738  * The page is locked and all ptes have been successfully removed.
    739  *
    740  * The new page will have replaced the old page if this function
    741  * is successful.
    742  *
    743  * Return value:
    744  *   < 0 - error code
    745  *  MIGRATEPAGE_SUCCESS - success
    746  */
    747 static int move_to_new_page(struct page *newpage, struct page *page,
    748                                 int remap_swapcache, enum migrate_mode mode)
    749 {
    750         struct address_space *mapping;
    751         int rc;
    752 
    753         /*
    754          * Block others from accessing the page when we get around to
    755          * establishing additional references. We are the only one
    756          * holding a reference to the new page at this point.
    757          */
    758         if (!trylock_page(newpage))
    759                 BUG();
    760 
    761         /* Prepare mapping for the new page.*/
    762         newpage->index = page->index;
    763         newpage->mapping = page->mapping;
    764         if (PageSwapBacked(page))
    765                 SetPageSwapBacked(newpage);
    766 
    767         mapping = page_mapping(page);
    768         if (!mapping)
    769                 rc = migrate_page(mapping, newpage, page, mode);
    770         else if (mapping->a_ops->migratepage)
    771                 /*
    772                  * Most pages have a mapping and most filesystems provide a
    773                  * migratepage callback. Anonymous pages are part of swap
    774                  * space which also has its own migratepage callback. This
    775                  * is the most common path for page migration.
    776                  */
    777                 rc = mapping->a_ops->migratepage(mapping,
    778                                                 newpage, page, mode);
    779         else
    780                 rc = fallback_migrate_page(mapping, newpage, page, mode);
    781 
    782         if (rc != MIGRATEPAGE_SUCCESS) {
    783                 newpage->mapping = NULL;
    784         } else {
    785                 if (remap_swapcache)
    786                         remove_migration_ptes(page, newpage);
    787                 page->mapping = NULL;
    788         }
    789 
    790         unlock_page(newpage);
    791 
    792         return rc;
    793 }
    

    migratepage callback of different filesystems

  • The nfs filesystem uses nfs_migrate_page() as migratepage() callback.
  • 553 const struct address_space_operations nfs_file_aops = {
    554         .readpage = nfs_readpage,
    555         .readpages = nfs_readpages,
    556         .set_page_dirty = __set_page_dirty_nobuffers,
    557         .writepage = nfs_writepage,
    558         .writepages = nfs_writepages,
    559         .write_begin = nfs_write_begin,
    560         .write_end = nfs_write_end,
    561         .invalidatepage = nfs_invalidate_page,
    562         .releasepage = nfs_release_page,
    563         .direct_IO = nfs_direct_IO,
    564         .migratepage = nfs_migrate_page,
    565         .launder_page = nfs_launder_page,
    566         .is_dirty_writeback = nfs_check_dirty_writeback,
    567         .error_remove_page = generic_error_remove_page,
    568 #ifdef CONFIG_NFS_SWAP
    569         .swap_activate = nfs_swap_activate,
    570         .swap_deactivate = nfs_swap_deactivate,
    571 #endif
    572 };
    
  • The ext4 filesystem uses buffer_migrate_page() as migratepage() callback.
  • 3271 static const struct address_space_operations ext4_aops = {
    3272         .readpage               = ext4_readpage,
    3273         .readpages              = ext4_readpages,
    3274         .writepage              = ext4_writepage,
    3275         .writepages             = ext4_writepages,
    3276         .write_begin            = ext4_write_begin,
    3277         .write_end              = ext4_write_end,
    3278         .bmap                   = ext4_bmap,
    3279         .invalidatepage         = ext4_invalidatepage,
    3280         .releasepage            = ext4_releasepage,
    3281         .direct_IO              = ext4_direct_IO,
    3282         .migratepage            = buffer_migrate_page,
    3283         .is_partially_uptodate  = block_is_partially_uptodate,
    3284         .error_remove_page      = generic_error_remove_page,
    3285 };
    

    migration API
    This most convenient migration API is migrate_pages(). The from argument is an isolated list of pages. The get_new_page is used to allocate new pages as migrate target.

    The return value of migrate_pages()

  • Return negative number as error code if failure
  • Otherwise, it returns the number of remaining pages in the from list. Return zero means all pages are migrated, and the from list become empty.
  • 1082 /*
    1083  * migrate_pages - migrate the pages specified in a list, to the free pages
    1084  *                 supplied as the target for the page migration
    1085  *
    1086  * @from:               The list of pages to be migrated.
    1087  * @get_new_page:       The function used to allocate free pages to be used
    1088  *                      as the target of the page migration.
    1089  * @private:            Private data to be passed on to get_new_page()
    1090  * @mode:               The migration mode that specifies the constraints for
    1091  *                      page migration, if any.
    1092  * @reason:             The reason for page migration.
    1093  *
    1094  * The function returns after 10 attempts or if no pages are movable any more
    1095  * because the list has become empty or no retryable pages exist any more.
    1096  * The caller should call putback_lru_pages() to return pages to the LRU
    1097  * or free list only if ret != 0.
    1098  *
    1099  * Returns the number of pages that were not migrated, or an error code.
    1100  */
    1101 int migrate_pages(struct list_head *from, new_page_t get_new_page,
    1102                 unsigned long private, enum migrate_mode mode, int reason)
    1103 {
    1104         int retry = 1;
    1105         int nr_failed = 0;
    1106         int nr_succeeded = 0;
    1107         int pass = 0;
    1108         struct page *page;
    1109         struct page *page2;
    1110         int swapwrite = current->flags & PF_SWAPWRITE;
    1111         int rc;
    1112 
    1113         if (!swapwrite)
    1114                 current->flags |= PF_SWAPWRITE;
    1115 
    1116         for(pass = 0; pass < 10 && retry; pass++) {
    1117                 retry = 0;
    1118 
    1119                 list_for_each_entry_safe(page, page2, from, lru) {
    1120                         cond_resched();
    1121 
    1122                         if (PageHuge(page))
    1123                                 rc = unmap_and_move_huge_page(get_new_page,
    1124                                                 private, page, pass > 2, mode);
    1125                         else
    1126                                 rc = unmap_and_move(get_new_page, private,
    1127                                                 page, pass > 2, mode);
    1128 
    1129                         switch(rc) {
    1130                         case -ENOMEM:
    1131                                 goto out;
    1132                         case -EAGAIN:
    1133                                 retry++;
    1134                                 break;
    1135                         case MIGRATEPAGE_SUCCESS:
    1136                                 nr_succeeded++;
    1137                                 break;
    1138                         default:
    1139                                 /*
    1140                                  * Permanent failure (-EBUSY, -ENOSYS, etc.):
    1141                                  * unlike -EAGAIN case, the failed page is
    1142                                  * removed from migration page list and not
    1143                                  * retried in the next outer loop.
    1144                                  */
    1145                                 nr_failed++;
    1146                                 break;
    1147                         }
    1148                 }
    1149         }
    1150         rc = nr_failed + retry;
    1151 out:
    1152         if (nr_succeeded)
    1153                 count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
    1154         if (nr_failed)
    1155                 count_vm_events(PGMIGRATE_FAIL, nr_failed);
    1156         trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
    1157 
    1158         if (!swapwrite)
    1159                 current->flags &= ~PF_SWAPWRITE;
    1160 
    1161         return rc;
    1162 }
    

    conclusion
    This post discusses migration in v3.15. It also describes move_to_new_page() and migrate_pages().


    %d bloggers like this: