Skip to content

Commit

Permalink
Merge pull request #26 from caps-tum/mt4g-refactor
Browse files Browse the repository at this point in the history
Mt4g refactor
  • Loading branch information
stepanvanecek authored Feb 19, 2024
2 parents 8b85749 + 5200253 commit 2c934c1
Show file tree
Hide file tree
Showing 19 changed files with 491 additions and 203 deletions.
135 changes: 135 additions & 0 deletions docs/Data_Parsers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# sys-sage Data Parsers Documentation

## Available Parsers
- [hwloc](#hwloc) (CPU topology)
- [mt4g](#mt4g) (GPU topology)

<a id="hwloc"></a>
### hwloc (CPU topology)
//TODO



<a id="mt4g"></a>
### mt4g (GPU topology)
Parser of mt4g ( https://github.com/caps-tum/mt4g ) project. This project captures the memory topology of Nvidia GPUs, specifically all GPUs since the Kepler microarchitecture. It is a set of microbenchmarks, which uncover the hidden structure and attributes of modern GPUs, and present them to the user for further processing.

#### General

With mt4g, one can generate a .csv output file, which contains the GPU topology information and attributes regarding the GPU. This .csv is a sys-sage Data Source, which is parsed by the mt4g Data Parser.

#### Parsing Logic
The mt4g Parser creates a new GPU topology representation, starting at the GPU level (as Chip component of type SYS_SAGE_CHIP_TYPE_GPU).

The topology is created with the following hierarchy:
- ```[1 ]``` **GPU** (component Chip, chip type SYS_SAGE_CHIP_TYPE_GPU)
- ```[1..n]``` **Global memory**(component Memory ; provided MAIN_MEMORY is Shared_On GPU-level; otherwise error)
- ```[1..n]``` **L2 cache** (component Cache; provided L2_DATA_CACHE is Shared_On GPU-level)
- ```[1..n]``` *other caches -- L1 cache, Texture cache, Read-only cache* (provided they are Shared_On GPU-level)
- ```[1..n]``` **SM** (component Subdivision, subdivisionType SYS_SAGE_SUBDIVISION_TYPE_GPU_SM)
- ```[1..n]``` *L2 cache (provided L2_DATA_CACHE is Shared_On SM-level)*
- ```[1..n]``` **other caches -- L1 cache, Texture cache, Read-only cache** (provided they are Shared_On SM-level) - Either as one object, if they are physically shared, or as separate objects if not.
- ```[1..n]``` **GPU Core** (component HW_Thread) -- child of L1 cache
- ```[1..n]``` **L1.5 Constant cache** (component Cache)
- ```[1 ]``` **L1 Constant cache** (component Cache)
- ```[1 ]``` **Shared memory** (component Memory; provided Shared_On SM-level; otherwise error)

- The **GPU (Chip component)** contains the following information (if found in the CSV, line GPU_INFORMATION, line COMPUTE_RESOURCE_INFORMATION, line ADDITIONAL_INFORMATION ):
- vendor
- model
- name = "GPU" (if new Chip is being created)
- ```attrib``` (key; value): "CUDA_compute_capability"; string* to the value
- ```attrib``` (key; value): "Number_of_streaming_multiprocessors"; int*
- ```attrib``` (key; value): "Number_of_cores_in_GPU"; int*
- ```attrib``` (key; value): "Number_of_cores_per_SM"; int*
- ```attrib``` (key; value): "GPU_Clock_Rate"; double* (clock rate in Hz)

Each GPU has one Global memory child.

- The **Global memory (Memory component)** contains the following information (if found in line ADDITIONAL_INFORMATION, line MAIN_MEMORY)
- size
- name = "GPU main memory"
- ```attrib``` (key; value): "Clock_Frequency", double* (clock rate in Hz, from field Memory_Clock_Frequency)
- ```attrib``` (key; value): "Bus_Width", int* (in bit, from field Memory_Bus_Width)

The Global memory has usually an L2 cache child/children. Alternatively, SMs can be children of Global memory, if the L2 cache is Shared_On SM_level.

- The **L2 cache (Cache component)**. There may be multiple L2 cache segments, if these are detected in mt4g benchmarks (Caches_Per_GPU). It contains the following information (if found in line L2_DATA_CACHE)
- cache_type = "L2"
- id = 0
- cache_size
- cache_line_size

The L2 would usually have the SMs as children (Shared_On = GPU-level) but can also be the other

- **SM -- Streaming Multiprocessor (Subdivision component)**. Subdivision of type SYS_SAGE_SUBDIVISION_TYPE_GPU_SM. One SM gets created for each SM the GPU has (as in COMPUTE_RESOURCE_INFORMATION - Number_of_streaming_multiprocessors). It contains the following information
- Name = "SM (Streaming Multiprocessor)"
- id - goes from 0 to n-1
- subdivision_type = SYS_SAGE_SUBDIVISION_TYPE_GPU_SM

SMs usually have multiple types of caches and Shared memory as children.

//TODO what if caches are Shared_On GPU-level?

- **L1 cache (Cache component)**. There are as many L1 caches created as specified in Caches_Per_SM (line L1_DATA_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line L1_DATA_CACHE)
- cache_type
- id = 0
- Name = "Cache"
- cache_size
- cache_line_size

The L1 cache (shared with others or not) has the GPU cores (of the whole SM or a respective portion based on Caches_Per_SM )as children.

- **Texture cache (Cache component)**. There are as many Texture caches created as specified in Caches_Per_SM (line TEXTURE_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line TEXTURE_CACHE)
- cache_type
- id = 0
- Name = "Cache"
- cache_size
- cache_line_size

- **Read-Only cache (Cache component)**. There are as many Read-Only caches created as specified in Caches_Per_SM (line READ-ONLY_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line READ-ONLY_CACHE)
- cache_type
- id = 0
- Name = "Cache"
- cache_size
- cache_line_size

- **Constant L1.5 cache (Cache component)**. The Constant L1.5 cache is created as a child of the SM it belongs to, and is filled with informaiton parsed on line CONST_L1_5_CACHE.
- cache_type = "Constant_L1.5"
- id = 0
- Name = "Cache"
- cache_size
- cache_line_size

Unless a Constant L1 cache is shared with L1 cache, it is a child of C_1.5 cache.

- **Constant L1 cache (Cache component)**. There is as many Constant L1 caches created as specified in Caches_Per_SM (line CONSTANT_L1_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line READ-ONLY_CACHE)
- cache_type
- id = 0
- Name = "Cache"
- cache_size
- cache_line_size

If Constant L1 cache is not shared with others, such as the L1 cache, it will be inserted as a child of Constant L1.5 cahce.

- **Shared memory (Memory component)** contains the following information (if found in line SHARED_MEMORY)
- size
- name = "Shared memory"

Shared memory is usually a child of an SM, unless L2 cache is shared on SM level (then it is L2 cache)

- **DataPath**
- **Load Latencies** are measured between the cores and several memories/cahces. They are oriented, DataPath type SYS_SAGE_DATAPATH_TYPE_LOGICAL. It contains the "Load_Latency" value from the particular entry. GPU cycles value is used (bool latency_in_cycles cannot be set up now).
The following DataPaths are created:
- **Global Memory --> each GPU core** (class Memory --> Thread)
- **Shared memroy --> all GPU cores from the SM** (class Memory --> Thread)
- **L2 cache --> each GPU core** (class Cache --> Thread)
- **L1 cache --> all child GPU cores** (class Cache --> Thread)
- **Texture cache --> all GPU cores from the SM** (class Cache --> Thread) -- if Texture cache is shared with L1, this DP does not get created (//TODO create anyways?)
- **Read-only cache --> all GPU cores from the SM** (class Cache --> Thread) -- if Read-only cache is shared with L1 or Texture, this DP does not get created (//TODO create anyways?)
- **Constant L1 cache --> all GPU cores from the SM** (class Cache --> Thread) -- if Constant L1 cache is shared with L1, Texture, or Read-only, this DP does not get created (//TODO create anyways?)
- **Constant L1.5 cache --> all GPU cores from the SM** (class Cache --> Thread)

Line "REGISTER_INFORMATION" of the output is not parsed. (//TODO parse as well?)


1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The main goal of the library is to to store, update, and provide all relevant in

- [sys-sage Library Concept](Concept.md)
- [Installation Guide](Installation_Guide.md)
- [Data Parsers Documentation](Data_Parsers.md)
- **API documentation**
- [**Component**](class_component.html) ( [Topology](class_topology.html), [Node](class_node.html), [Memory](class_memory.html), [Storage](class_storage.html), [Chip](class_chip.html), [Cache](class_cache.html), [Subdivision](class_subdivision.html), [Numa](class_numa.html), [Core](class_core.html), [Thread](class_thread.html) )
- [**Data Path**](class_data_path.html)
Expand Down
4 changes: 2 additions & 2 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ endif()

add_executable(basic_usage basic_usage.cpp)
add_executable(custom_attributes custom_attributes.cpp)
add_executable(gpu-topo-parser gpu-topo-parser.cpp)
add_executable(mt4g-parser mt4g-parser.cpp)
add_executable(larger_topo larger_topo.cpp)
add_executable(sys-sage-benchmarking sys-sage-benchmarking.cpp)
add_executable(use_custom_parser custom_parser_musa/use_custom_parser.cpp custom_parser_musa/musa_parser.cpp custom_parser_musa/musa_parser.hpp)
add_executable(cccbenchplushwloc cccbenchplushwloc.cpp)

install(TARGETS basic_usage gpu-topo-parser custom_attributes larger_topo sys-sage-benchmarking use_custom_parser cccbenchplushwloc DESTINATION bin/examples)
install(TARGETS basic_usage mt4g-parser custom_attributes larger_topo sys-sage-benchmarking use_custom_parser cccbenchplushwloc DESTINATION bin/examples)
install(DIRECTORY example_data DESTINATION bin/examples)

if(CAT_AWARE)
Expand Down
2 changes: 1 addition & 1 deletion examples/custom_parser_musa/musa_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ Memory* MusaParser::ParseMemory() {
else if (output.find("tb") != std::string::npos)
size *= (long long)1000*(long long)1000*(long long)1000*(long long)1000;

Memory* mem = new Memory(socket, input, size);
Memory* mem = new Memory(socket, 0, input, size);
return mem;
}

Expand Down
6 changes: 3 additions & 3 deletions examples/gpu-topo-parser.cpp → examples/mt4g-parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

void usage(char* argv0)
{
std::cerr << "usage: " << argv0 << " <gpu-topo path>" << std::endl;
std::cerr << "usage: " << argv0 << " <mt4g output path>" << std::endl;
std::cerr << " or" << std::endl;
std::cerr << " " << argv0 << " (uses predefined paths which may be incorrect.)" << std::endl;
return;
Expand All @@ -32,8 +32,8 @@ int main(int argc, char *argv[])
Topology* topo = new Topology();
Node* n = new Node(topo,1);

cout << "-- Parsing gpu-topo benchmark from file " << gpuTopoPath << endl;
if(parseGpuTopo((Component*)n, gpuTopoPath, 0, ";") != 0) { //adds topo to a next node
cout << "-- Parsing mt4g output from file " << gpuTopoPath << endl;
if(parseMt4gTopo((Component*)n, gpuTopoPath, 0, ";") != 0) { //adds topo to a next node
return 1;
}
cout << "-- End parseGpuTopo" << endl;
Expand Down
2 changes: 1 addition & 1 deletion examples/sys-sage-benchmarking.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ int main(int argc, char *argv[])
//time mt4g parser
Chip* gpu = new Chip(n, 100, "GPU");
t_start = high_resolution_clock::now();
ret = parseGpuTopo(gpu, mt4gPath, ";");
ret = parseMt4gTopo(gpu, mt4gPath, ";");
t_end = high_resolution_clock::now();
uint64_t time_parseMt4g = t_end.time_since_epoch().count()-t_start.time_since_epoch().count()-timer_overhead;
if(ret != 0){
Expand Down
4 changes: 2 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ set(SOURCES
xml_dump.cpp
parsers/hwloc.cpp
parsers/caps-numa-benchmark.cpp
parsers/gpu-topo.cpp
parsers/mt4g.cpp
parsers/cccbench.cpp
)

Expand All @@ -24,7 +24,7 @@ set(HEADERS
xml_dump.hpp
parsers/hwloc.hpp
parsers/caps-numa-benchmark.hpp
parsers/gpu-topo.hpp
parsers/mt4g.hpp
parsers/cccbench.cpp
)

Expand Down
64 changes: 63 additions & 1 deletion src/Topology.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,68 @@ void Component::InsertChild(Component * child)
child->SetParent(this);
children.push_back(child);
}
int Component::InsertBetweenParentAndChild(Component* parent, Component* child, bool alreadyParentsChild)
{
//consistency check
vector<Component*> * p_children = parent->GetChildren();
if(child->GetParent() != parent){
if(std::find(p_children->begin(), p_children->end(), child) != p_children->end())
return 1; //child and parent are not child and parent in the component tree
else
return 2; //corrupt component tree -> bad thing
}
else{
if(std::find(p_children->begin(), p_children->end(), child) == p_children->end())
return 3; //corrupt component tree -> bad thing
}

//remove from grandparent's list; set new parent; insert child into the new component's list
p_children->erase(std::remove(p_children->begin(), p_children->end(), child), p_children->end());
child->SetParent(this);
this->InsertChild(child);

//finally, insert new component to grandparent's children list
if(!alreadyParentsChild)
{
this->SetParent(parent);
parent->InsertChild(this);
}

return 0;
}
int Component::InsertBetweenParentAndChildren(Component* parent, vector<Component*> children, bool alreadyParentsChild)
{
vector<Component*> * p_children = parent->GetChildren();
for(Component* child: children) //first just check for consistency
{
bool isParent = (child->GetParent() == parent);
if(std::find(p_children->begin(), p_children->end(), child) == p_children->end()){ //child not listed as parent's child
if(isParent)
return 2; //corrupt component tree -> bad thing
else
return 1; // just entered a component in the list, which is not a child of the parent
}
if(!isParent)
return 3; //corrupt component tree -> bad thing
}

for(Component* child: children) //second time do the actual inserting
{
//remove from grandparent's list; set new parent; insert child into the new component's list
p_children->erase(std::remove(p_children->begin(), p_children->end(), child), p_children->end());
child->SetParent(this);
this->InsertChild(child);
}

//finally, insert new component to grandparent's children list
if(!alreadyParentsChild)
{
this->SetParent(parent);
parent->InsertChild(this);
}

return 0;
}
int Component::RemoveChild(Component * child)
{
int orig_size = children.size();
Expand Down Expand Up @@ -537,7 +599,7 @@ Node::Node(int _id, string _name):Component(_id, _name, SYS_SAGE_COMPONENT_NODE)
Node::Node(Component * parent, int _id, string _name):Component(parent, _id, _name, SYS_SAGE_COMPONENT_NODE){}

Memory::Memory():Component(0, "Memory", SYS_SAGE_COMPONENT_MEMORY){}
Memory::Memory(Component * parent, string _name, long long _size):Component(parent, 0, _name, SYS_SAGE_COMPONENT_MEMORY), size(_size){}
Memory::Memory(Component * parent, int _id, string _name, long long _size):Component(parent, _id, _name, SYS_SAGE_COMPONENT_MEMORY), size(_size){}

Storage::Storage():Component(0, "Storage", SYS_SAGE_COMPONENT_STORAGE){}
Storage::Storage(Component * parent):Component(parent, 0, "Storage", SYS_SAGE_COMPONENT_STORAGE){}
Expand Down
Loading

0 comments on commit 2c934c1

Please sign in to comment.