Skip to content

Latest commit

 

History

History
86 lines (74 loc) · 3.69 KB

allocation.md

File metadata and controls

86 lines (74 loc) · 3.69 KB

Allocation Policy

There is some NUMA-awareness behind the (book) allocation algorithm used for sizing a file. The default allocation policy randomly allocates books from all nodes that still have books available. The default policy is an extended file attribute of the LFS mount point on nodes, typically /lfs. See the man pages for attr(5), xattr(7), getfattr(1), setfattr(1) and related system calls for programmatic use. To see the current default policy on your system, log into a node and execute...

$ getfattr -n user.LFS.AllocationPolicyDefault /lfs
getfattr: Removing leading '/' from absolute path names
# file: lfs
user.LFS.AllocationPolicyDefault="RandomBooks"

When a file is created the default policy becomes the starting policy for that file. The file's policy can be changed at any time; future allocations follow the current policy. Books are deallocated from a file on a LIFO basis (ie, simple truncation).

$ touch /lfs/newfile
$ getfattr -d /lfs/newfile
getfattr: Removing leading '/' from absolute path names
# file: lfs/newfile
user.LFS.AllocationPolicy="RandomBooks"
user.LFS.AllocationPolicyList="RandomBooks,LocalNode,LocalEnc,NonLocal_Enc,Nearest,NearestRemote,NearestEnc,NearestRack,LZAascending,LZAdescending,RequestIG"
user.LFS.Interleave
user.LFS.InterleaveRequest
user.LFS.InterleaveRequestPos

Consider a 20-node system with enclosures 1 and 2 fully populated (nodes 1-20 exist). There is an application running on node 15 (in enclosure 2). Here's how the allocation policies pan out. Allocations that run out of space will return an ENOSPC error.

Policy Description Candidate nodes
RandomBooks Any node with free books 1-20
LocalNode Only the node running the application 15
LocalEnc Nodes in the same enclosure 11-20
NonLocal_Enc Nodes in the same enclosure except this node 11-14 & 16-20
Nearest Start with LocalNode until it's full, then the local enclosure, then anywhere 15, then 11-14 & 16-20, finally 1-10
NearestEnc Like Nearest, but skip LocalNode 11-14 & 16-20, finally 1-10
NearestRack Not in this enclosure 1-10
NearestRemote Not this node 1-14 & 16-20
RequestIG Follow a specific list of nodes, with wraparound See following text
LZAascending Ordered blocks from lowest node to highest, useful in Librarian development 1, 2, 3, ... 19, 20
LZAdescending Reverse of LZAascending 20, 19, 18, ... 2, 1

RequestIG must be followed by another attribute with the list. This list is one byte per book, numbering starts at zero, and is repeated as needed. For example, to allocate books for a file from only nodes 3 and 4,

$ touch /lfs/only3and4
$ setfattr -n user.LFS.InterleaveRequest -v 0x0203 /lfs/only3and4  
$ getfattr -d /lfs/only3and4
getfattr: Removing leading '/' from absolute path names
# file: lfs/only3and4
user.LFS.AllocationPolicy="RequestIG"
user.LFS.AllocationPolicyList="RandomBooks,LocalNode,LocalEnc,NonLocal_Enc,Nearest,NearestRemote,NearestEnc,NearestRack,LZAascending,LZAdescending,RequestIG"
user.LFS.Interleave
user.LFS.InterleaveRequest=0sAgM=
user.LFS.InterleaveRequestPos="0"

Interleave variables are encoded in hex, use the -e hex arguments. InterleaveRequestPos is the index in the list of the next node to use for allocation; usually it's left alone. After allocating 5 books for the file, look at the actual allocation:

$ getfattr -e hex -d /lfs/only3and4 | grep Interleave
user.LFS.Interleave=0x0203020302
user.LFS.InterleaveRequest=0x0203
user.LFS.InterleaveRequestPos=0x31

This data can also be seen on the ToRMS by examining the SQLite database at the heart of the Librarian. There are five wrapper scripts to assist with this: lslfs, lllfs, dulfs, dflfs, and alloclfs.