Skip to content

Commit

Permalink
Automatic merge of 'master' into merge (2024-05-21 13:04)
Browse files Browse the repository at this point in the history
  • Loading branch information
mpe committed May 21, 2024
2 parents 415e450 + 70ec81c commit fac5bef
Show file tree
Hide file tree
Showing 1,794 changed files with 69,602 additions and 23,321 deletions.
2 changes: 1 addition & 1 deletion Documentation/ABI/testing/sysfs-fs-f2fs
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ Date: January 2018
Contact: Jaegeuk Kim <jaegeuk@kernel.org>
Description: This indicates how many GC can be failed for the pinned
file. If it exceeds this, F2FS doesn't guarantee its pinning
state. 2048 trials is set by default.
state. 2048 trials is set by default, and 65535 as maximum.

What: /sys/fs/f2fs/<disk>/extension_list
Date: February 2018
Expand Down
6 changes: 3 additions & 3 deletions Documentation/ABI/testing/sysfs-kernel-mm-damon
Original file line number Diff line number Diff line change
Expand Up @@ -314,9 +314,9 @@ Date: Dec 2022
Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the type of
the memory of the interest. 'anon' for anonymous pages,
'memcg' for specific memory cgroup, 'addr' for address range
(an open-ended interval), or 'target' for DAMON monitoring
target can be written and read.
'memcg' for specific memory cgroup, 'young' for young pages,
'addr' for address range (an open-ended interval), or 'target'
for DAMON monitoring target can be written and read.

What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/memcg_path
Date: Dec 2022
Expand Down
18 changes: 18 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-mm-transparent-hugepage
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
What: /sys/kernel/mm/transparent_hugepage/
Date: April 2024
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description:
/sys/kernel/mm/transparent_hugepage/ contains a number of files and
subdirectories,

- defrag
- enabled
- hpage_pmd_size
- khugepaged
- shmem_enabled
- use_zero_page
- subdirectories of the form hugepages-<size>kB, where <size>
is the page size of the hugepages supported by the kernel/CPU
combination.

See Documentation/admin-guide/mm/transhuge.rst for details.
8 changes: 4 additions & 4 deletions Documentation/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -80,22 +80,22 @@ loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit;
# * dest folder relative to $(BUILDDIR) and
# * cache folder relative to $(BUILDDIR)/.doctrees
# $4 dest subfolder e.g. "man" for man pages at userspace-api/media/man
# $5 reST source folder relative to $(srctree)/$(src),
# $5 reST source folder relative to $(src),
# e.g. "userspace-api/media" for the linux-tv book-set at ./Documentation/userspace-api/media

quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media $2 && \
PYTHONDONTWRITEBYTECODE=1 \
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(src)/$5/$(SPHINX_CONF)) \
$(PYTHON3) $(srctree)/scripts/jobserver-exec \
$(CONFIG_SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \
$(SPHINXBUILD) \
-b $2 \
-c $(abspath $(srctree)/$(src)) \
-c $(abspath $(src)) \
-d $(abspath $(BUILDDIR)/.doctrees/$3) \
-D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
$(ALLSPHINXOPTS) \
$(abspath $(srctree)/$(src)/$5) \
$(abspath $(src)/$5) \
$(abspath $(BUILDDIR)/$3/$4) && \
if [ "x$(DOCS_CSS)" != "x" ]; then \
cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \
Expand Down
5 changes: 5 additions & 0 deletions Documentation/admin-guide/blockdev/zram.rst
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,11 @@ of equal or greater size:::
#recompress idle pages larger than 2000 bytes
echo "type=idle threshold=2000" > /sys/block/zramX/recompress

It is also possible to limit the number of pages zram re-compression will
attempt to recompress:::

echo "type=huge_idle max_pages=42" > /sys/block/zramX/recompress

Recompression of idle pages requires memory tracking.

During re-compression for every page, that matches re-compression criteria,
Expand Down
7 changes: 6 additions & 1 deletion Documentation/admin-guide/cgroup-v1/cpusets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -568,7 +568,7 @@ on the next tick. For some applications in special situation, waiting

The 'cpuset.sched_relax_domain_level' file allows you to request changing
this searching range as you like. This file takes int value which
indicates size of searching range in levels ideally as follows,
indicates size of searching range in levels approximately as follows,
otherwise initial value -1 that indicates the cpuset has no request.

====== ===========================================================
Expand All @@ -581,6 +581,11 @@ otherwise initial value -1 that indicates the cpuset has no request.
5 search system wide [on NUMA system]
====== ===========================================================

Not all levels can be present and values can change depending on the
system architecture and kernel configuration. Check
/sys/kernel/debug/sched/domains/cpu*/domain*/ for system-specific
details.

The system default is architecture dependent. The system default
can be changed using the relax_domain_level= boot parameter.

Expand Down
8 changes: 4 additions & 4 deletions Documentation/admin-guide/cgroup-v1/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -300,14 +300,14 @@ When oom event notifier is registered, event will be delivered.

Lock order is as follows::

Page lock (PG_locked bit of page->flags)
folio_lock
mm->page_table_lock or split pte_lock
folio_memcg_lock (memcg->move_lock)
mapping->i_pages lock
lruvec->lru_lock.

Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
lruvec->lru_lock; PG_lru bit of page->flags is cleared before
lruvec->lru_lock; the folio LRU flag is cleared before
isolating a page from its LRU under lruvec->lru_lock.

.. _cgroup-v1-memory-kernel-extension:
Expand Down Expand Up @@ -802,8 +802,8 @@ a page or a swap can be moved only when it is charged to the task's current
| | anonymous pages, file pages (and swaps) in the range mmapped by the task |
| | will be moved even if the task hasn't done page fault, i.e. they might |
| | not be the task's "RSS", but other task's "RSS" that maps the same file. |
| | And mapcount of the page is ignored (the page can be moved even if |
| | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to |
| | The mapcount of the page is ignored (the page can be moved independent |
| | of the mapcount). You must enable Swap Extension (see 2.4) to |
| | enable move of swap charges. |
+---+--------------------------------------------------------------------------+

Expand Down
2 changes: 1 addition & 1 deletion Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1435,7 +1435,7 @@ PAGE_SIZE multiple when read back.
sec_pagetables
Amount of memory allocated for secondary page tables,
this currently includes KVM mmu allocations on x86
and arm64.
and arm64 and IOMMU page tables.

percpu (npn)
Amount of memory used for storing per-cpu kernel
Expand Down
8 changes: 4 additions & 4 deletions Documentation/admin-guide/kdump/kdump.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,6 @@ System kernel config options

CONFIG_KEXEC_CORE=y

Subsequently, CRASH_CORE is selected by KEXEC_CORE::

CONFIG_CRASH_CORE=y

2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
filesystems." This is usually enabled by default::

Expand Down Expand Up @@ -168,6 +164,10 @@ Dump-capture kernel config options (Arch Independent)

CONFIG_CRASH_DUMP=y

And this will select VMCORE_INFO and CRASH_RESERVE::
CONFIG_VMCORE_INFO=y
CONFIG_CRASH_RESERVE=y

2) Enable "/proc/vmcore support" under "Filesystems" -> "Pseudo filesystems"::

CONFIG_PROC_VMCORE=y
Expand Down
11 changes: 9 additions & 2 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2151,6 +2151,12 @@
Format: 0 | 1
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.

init_mlocked_on_free= [MM] Fill freed userspace memory with zeroes if
it was mlock'ed and not explicitly munlock'ed
afterwards.
Format: 0 | 1
Default set by CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON

init_pkru= [X86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can
Expand Down Expand Up @@ -3781,10 +3787,12 @@
Format: [state][,regs][,debounce][,die]

nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels
Format: [panic,][nopanic,][num]
Format: [panic,][nopanic,][rNNN,][num]
Valid num: 0 or 1
0 - turn hardlockup detector in nmi_watchdog off
1 - turn hardlockup detector in nmi_watchdog on
rNNN - configure the watchdog with raw perf event 0xNNN

When panic is specified, panic when an NMI watchdog
timeout occurs (or 'nopanic' to not panic on an NMI
watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set)
Expand Down Expand Up @@ -7501,4 +7509,3 @@
memory, and other data can't be written using
xmon commands.
off xmon is disabled.

32 changes: 16 additions & 16 deletions Documentation/admin-guide/mm/damon/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ Users can write below commands for the kdamond to the ``state`` file.
- ``clear_schemes_tried_regions``: Clear the DAMON-based operating scheme
action tried regions directory for each DAMON-based operation scheme of the
kdamond.
- ``update_schemes_effective_bytes``: Update the contents of
- ``update_schemes_effective_quotas``: Update the contents of
``effective_bytes`` files for each DAMON-based operation scheme of the
kdamond. For more details, refer to :ref:`quotas directory <sysfs_quotas>`.

Expand Down Expand Up @@ -342,7 +342,7 @@ Based on the user-specified :ref:`goal <sysfs_schemes_quota_goals>`, the
effective size quota is further adjusted. Reading ``effective_bytes`` returns
the current effective size quota. The file is not updated in real time, so
users should ask DAMON sysfs interface to update the content of the file for
the stats by writing a special keyword, ``update_schemes_effective_bytes`` to
the stats by writing a special keyword, ``update_schemes_effective_quotas`` to
the relevant ``kdamonds/<N>/state`` file.

Under ``weights`` directory, three files (``sz_permil``,
Expand Down Expand Up @@ -410,19 +410,19 @@ in the numeric order.

Each filter directory contains six files, namely ``type``, ``matcing``,
``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``. To ``type``
file, you can write one of four special keywords: ``anon`` for anonymous pages,
``memcg`` for specific memory cgroup, ``addr`` for specific address range (an
open-ended interval), or ``target`` for specific DAMON monitoring target
filtering. In case of the memory cgroup filtering, you can specify the memory
cgroup of the interest by writing the path of the memory cgroup from the
cgroups mount point to ``memcg_path`` file. In case of the address range
filtering, you can specify the start and end address of the range to
``addr_start`` and ``addr_end`` files, respectively. For the DAMON monitoring
target filtering, you can specify the index of the target between the list of
the DAMON context's monitoring targets list to ``target_idx`` file. You can
write ``Y`` or ``N`` to ``matching`` file to filter out pages that does or does
not match to the type, respectively. Then, the scheme's action will not be
applied to the pages that specified to be filtered out.
file, you can write one of five special keywords: ``anon`` for anonymous pages,
``memcg`` for specific memory cgroup, ``young`` for young pages, ``addr`` for
specific address range (an open-ended interval), or ``target`` for specific
DAMON monitoring target filtering. In case of the memory cgroup filtering, you
can specify the memory cgroup of the interest by writing the path of the memory
cgroup from the cgroups mount point to ``memcg_path`` file. In case of the
address range filtering, you can specify the start and end address of the range
to ``addr_start`` and ``addr_end`` files, respectively. For the DAMON
monitoring target filtering, you can specify the index of the target between
the list of the DAMON context's monitoring targets list to ``target_idx`` file.
You can write ``Y`` or ``N`` to ``matching`` file to filter out pages that does
or does not match to the type, respectively. Then, the scheme's action will
not be applied to the pages that specified to be filtered out.

For example, below restricts a DAMOS action to be applied to only non-anonymous
pages of all memory cgroups except ``/having_care_already``.::
Expand All @@ -434,7 +434,7 @@ pages of all memory cgroups except ``/having_care_already``.::
# # further filter out all cgroups except one at '/having_care_already'
echo memcg > 1/type
echo /having_care_already > 1/memcg_path
echo N > 1/matching
echo Y > 1/matching

Note that ``anon`` and ``memcg`` filters are currently supported only when
``paddr`` :ref:`implementation <sysfs_context>` is being used.
Expand Down
7 changes: 7 additions & 0 deletions Documentation/admin-guide/mm/hugetlbpage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,13 @@ Note that the number of overcommit and reserve pages remain global quantities,
as we don't know until fault time, when the faulting task's mempolicy is
applied, from which node the huge page allocation will be attempted.

The hugetlb may be migrated between the per-node hugepages pool in the following
scenarios: memory offline, memory failure, longterm pinning, syscalls(mbind,
migrate_pages and move_pages), alloc_contig_range() and alloc_contig_pages().
Now only memory offline, memory failure and syscalls allow fallbacking to allocate
a new hugetlb on a different node if the current node is unable to allocate during
hugetlb migration, that means these 3 cases can break the per-node hugepages pool.

.. _using_huge_pages:

Using Huge Pages
Expand Down
35 changes: 32 additions & 3 deletions Documentation/admin-guide/mm/transhuge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,8 @@ collapsed, resulting fewer pages being collapsed into
THPs, and lower memory access performance.

``max_ptes_shared`` specifies how many pages can be shared across multiple
processes. Exceeding the number would block the collapse::
processes. khugepaged might treat pages of THPs as shared if any page of
that THP is shared. Exceeding the number would block the collapse::

/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared

Expand Down Expand Up @@ -369,15 +370,15 @@ monitor how successfully the system is providing huge pages for use.

thp_fault_alloc
is incremented every time a huge page is successfully
allocated to handle a page fault.
allocated and charged to handle a page fault.

thp_collapse_alloc
is incremented by khugepaged when it has found
a range of pages to collapse into one huge page and has
successfully allocated a new huge page to store the data.

thp_fault_fallback
is incremented if a page fault fails to allocate
is incremented if a page fault fails to allocate or charge
a huge page and instead falls back to using small pages.

thp_fault_fallback_charge
Expand Down Expand Up @@ -447,6 +448,34 @@ thp_swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.

In /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/stats, There are
also individual counters for each huge page size, which can be utilized to
monitor the system's effectiveness in providing huge pages for usage. Each
counter has its own corresponding file.

anon_fault_alloc
is incremented every time a huge page is successfully
allocated and charged to handle a page fault.

anon_fault_fallback
is incremented if a page fault fails to allocate or charge
a huge page and instead falls back to using huge pages with
lower orders or small pages.

anon_fault_fallback_charge
is incremented if a page fault fails to charge a huge page and
instead falls back to using huge pages with lower orders or
small pages even though the allocation was successful.

anon_swpout
is incremented every time a huge page is swapped out in one
piece without splitting.

anon_swpout_fallback
is incremented if a huge page has to be split before swapout.
Usually because failed to allocate some continuous swap space
for the huge page.

As the system ages, allocating huge pages may be expensive as the
system uses memory compaction to copy data around memory to free a
huge page for use. There are some counters in ``/proc/vmstat`` to help
Expand Down
29 changes: 0 additions & 29 deletions Documentation/admin-guide/mm/zswap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,35 +111,6 @@ checked if it is a same-value filled page before compressing it. If true, the
compressed length of the page is set to zero and the pattern or same-filled
value is stored.

Same-value filled pages identification feature is enabled by default and can be
disabled at boot time by setting the ``same_filled_pages_enabled`` attribute
to 0, e.g. ``zswap.same_filled_pages_enabled=0``. It can also be enabled and
disabled at runtime using the sysfs ``same_filled_pages_enabled``
attribute, e.g.::

echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled

When zswap same-filled page identification is disabled at runtime, it will stop
checking for the same-value filled pages during store operation.
In other words, every page will be then considered non-same-value filled.
However, the existing pages which are marked as same-value filled pages remain
stored unchanged in zswap until they are either loaded or invalidated.

In some circumstances it might be advantageous to make use of just the zswap
ability to efficiently store same-filled pages without enabling the whole
compressed page storage.
In this case the handling of non-same-value pages by zswap (enabled by default)
can be disabled by setting the ``non_same_filled_pages_enabled`` attribute
to 0, e.g. ``zswap.non_same_filled_pages_enabled=0``.
It can also be enabled and disabled at runtime using the sysfs
``non_same_filled_pages_enabled`` attribute, e.g.::

echo 1 > /sys/module/zswap/parameters/non_same_filled_pages_enabled

Disabling both ``zswap.same_filled_pages_enabled`` and
``zswap.non_same_filled_pages_enabled`` effectively disables accepting any new
pages by zswap.

To prevent zswap from shrinking pool when zswap is full and there's a high
pressure on swap (this will result in flipping pages in and out zswap pool
without any real benefit but with a performance drop for the system), a
Expand Down
16 changes: 16 additions & 0 deletions Documentation/admin-guide/sysctl/vm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Currently, these files are in /proc/sys/vm:
- legacy_va_layout
- lowmem_reserve_ratio
- max_map_count
- mem_profiling (only if CONFIG_MEM_ALLOC_PROFILING=y)
- memory_failure_early_kill
- memory_failure_recovery
- min_free_kbytes
Expand Down Expand Up @@ -425,6 +426,21 @@ e.g., up to one or two maps per allocation.
The default value is 65530.


mem_profiling
==============

Enable memory profiling (when CONFIG_MEM_ALLOC_PROFILING=y)

1: Enable memory profiling.

0: Disable memory profiling.

Enabling memory profiling introduces a small performance overhead for all
memory allocations.

The default value depends on CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.


memory_failure_early_kill:
==========================

Expand Down
Loading

0 comments on commit fac5bef

Please sign in to comment.