Merge pull request #26 from caps-tum/mt4g-refactor

Mt4g refactor
caps-tum · Feb 19, 2024 · 2c934c1 · 2c934c1
2 parents 8b85749 + 5200253
commit 2c934c1
Show file tree

Hide file tree

Showing 19 changed files with 491 additions and 203 deletions.
diff --git a/docs/Data_Parsers.md b/docs/Data_Parsers.md
@@ -0,0 +1,135 @@
+# sys-sage Data Parsers Documentation
+
+## Available Parsers
+- [hwloc](#hwloc) (CPU topology)
+- [mt4g](#mt4g) (GPU topology)
+
+<a id="hwloc"></a>
+### hwloc (CPU topology)
+//TODO
+
+
+
+<a id="mt4g"></a>
+### mt4g (GPU topology)
+Parser of mt4g ( https://github.com/caps-tum/mt4g ) project. This project captures the memory topology of Nvidia GPUs, specifically all GPUs since the Kepler microarchitecture. It is a set of microbenchmarks, which uncover the hidden structure and attributes of modern GPUs, and present them to the user for further processing.
+
+#### General
+
+With mt4g, one can generate a .csv output file, which contains the GPU topology information and attributes regarding the GPU. This .csv is a sys-sage Data Source, which is parsed by the mt4g Data Parser.
+
+#### Parsing Logic
+The mt4g Parser creates a new GPU topology representation, starting at the GPU level (as Chip component of type SYS_SAGE_CHIP_TYPE_GPU).
+
+The topology is created with the following hierarchy:
+- ```[1   ]``` **GPU** (component Chip, chip type SYS_SAGE_CHIP_TYPE_GPU)
+    - ```[1..n]``` **Global memory**(component Memory ; provided MAIN_MEMORY is Shared_On GPU-level; otherwise error)
+        - ```[1..n]``` **L2 cache** (component Cache; provided L2_DATA_CACHE is Shared_On GPU-level)
+            - ```[1..n]``` *other caches -- L1 cache, Texture cache, Read-only cache* (provided they are Shared_On GPU-level)
+                - ```[1..n]``` **SM** (component Subdivision, subdivisionType SYS_SAGE_SUBDIVISION_TYPE_GPU_SM)
+                    - ```[1..n]``` *L2 cache (provided L2_DATA_CACHE is Shared_On SM-level)*
+                        - ```[1..n]``` **other caches -- L1 cache, Texture cache, Read-only cache** (provided they are Shared_On SM-level) - Either as one object, if they are physically shared, or as separate objects if not.
+                            - ```[1..n]``` **GPU Core** (component HW_Thread) -- child of L1 cache
+                        - ```[1..n]``` **L1.5 Constant cache** (component Cache)
+                            - ```[1   ]``` **L1 Constant cache** (component Cache)
+                        - ```[1   ]``` **Shared memory** (component Memory; provided Shared_On SM-level; otherwise error)
+
+- The **GPU (Chip component)** contains the following information (if found in the CSV, line GPU_INFORMATION, line COMPUTE_RESOURCE_INFORMATION, line ADDITIONAL_INFORMATION ):
+    - vendor
+    - model 
+    - name = "GPU" (if new Chip is being created)
+    - ```attrib``` (key; value): "CUDA_compute_capability"; string* to the value
+    - ```attrib``` (key; value): "Number_of_streaming_multiprocessors"; int*
+    - ```attrib``` (key; value): "Number_of_cores_in_GPU"; int*
+    - ```attrib``` (key; value): "Number_of_cores_per_SM"; int*
+    - ```attrib``` (key; value): "GPU_Clock_Rate"; double* (clock rate in Hz)
+
+Each GPU has one Global memory child.
+
+- The **Global memory (Memory component)** contains the following information (if found in line ADDITIONAL_INFORMATION, line MAIN_MEMORY)
+    - size
+    - name = "GPU main memory"
+    - ```attrib``` (key; value): "Clock_Frequency", double* (clock rate in Hz, from field Memory_Clock_Frequency)
+    - ```attrib``` (key; value): "Bus_Width", int* (in bit, from field Memory_Bus_Width)
+
+The Global memory has usually an L2 cache child/children. Alternatively, SMs can be children of Global memory, if the L2 cache is Shared_On SM_level.
+
+- The **L2 cache (Cache component)**. There may be multiple L2 cache segments, if these are detected in mt4g benchmarks (Caches_Per_GPU). It contains the following information (if found in line L2_DATA_CACHE)
+    - cache_type = "L2"
+    - id = 0
+    - cache_size
+    - cache_line_size
+
+The L2 would usually have the SMs as children (Shared_On = GPU-level) but can also be the other 
+
+- **SM -- Streaming Multiprocessor (Subdivision component)**. Subdivision of type SYS_SAGE_SUBDIVISION_TYPE_GPU_SM. One SM gets created for each SM the GPU has (as in COMPUTE_RESOURCE_INFORMATION - Number_of_streaming_multiprocessors). It contains the following information
+    - Name = "SM (Streaming Multiprocessor)"
+    - id - goes from 0 to n-1
+    - subdivision_type = SYS_SAGE_SUBDIVISION_TYPE_GPU_SM
+
+SMs usually have multiple types of caches and Shared memory as children.
+
+//TODO what if caches are Shared_On GPU-level?
+
+- **L1 cache (Cache component)**. There are as many L1 caches created as specified in Caches_Per_SM (line L1_DATA_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line L1_DATA_CACHE)
+    - cache_type
+    - id = 0
+    - Name = "Cache"
+    - cache_size
+    - cache_line_size
+
+The L1 cache (shared with others or not) has the GPU cores (of the whole SM or a respective portion based on Caches_Per_SM )as children.
+
+- **Texture cache (Cache component)**. There are as many Texture caches created as specified in Caches_Per_SM (line TEXTURE_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line TEXTURE_CACHE)
+    - cache_type
+    - id = 0
+    - Name = "Cache"
+    - cache_size
+    - cache_line_size
+
+- **Read-Only cache (Cache component)**. There are as many Read-Only caches created as specified in Caches_Per_SM (line READ-ONLY_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line READ-ONLY_CACHE)
+    - cache_type
+    - id = 0
+    - Name = "Cache"
+    - cache_size
+    - cache_line_size
+
+- **Constant L1.5 cache (Cache component)**. The Constant L1.5 cache is created as a child of the SM it belongs to, and is filled with informaiton parsed on line CONST_L1_5_CACHE.
+    - cache_type = "Constant_L1.5"
+    - id = 0
+    - Name = "Cache"
+    - cache_size
+    - cache_line_size
+
+Unless a Constant L1 cache is shared with L1 cache, it is a child of C_1.5 cache.
+
+- **Constant L1 cache (Cache component)**. There is as many Constant L1 caches created as specified in Caches_Per_SM (line CONSTANT_L1_CACHE). The L1, Texture, ReadOnly , and Constant L1 caches may be shared on one physical chip -- if this is the case, they will also be represented as one Cache component in sys-sage. If L1 cache is shared with others, the whole group takes over the values from L1_DATA_CACHE. If no L1 but a Texture cache is present, the group takes over values from TEXTURE_CACHE line. If neither L1 nor Texture is present but ReadOnly is, the group takes over the information from the READ-ONLY_CACHE line. The sharing is distinguisned by the "cache_type" attribute. The possible options are "L1", "L1+Texture", "L1+ReadOnly", "L1+Constant_L1", "L1+Texture+ReadOnly", "L1+Texture+Constant_L1", "L1+ReadOnly+Constant_L1", "L1+Texture+ReadOnly+Constant_L1", "Texture", "Texture+ReadOnly", "Texture+Constant_L1", "Texture+ReadOnly+Constant_L1", "ReadOnly", "ReadOnly+Constant_L1", "Constant_L1". It contains the following information (if found in line READ-ONLY_CACHE)
+    - cache_type
+    - id = 0
+    - Name = "Cache"
+    - cache_size
+    - cache_line_size
+
+If Constant L1 cache is not shared with others, such as the L1 cache, it will be inserted as a child of Constant L1.5 cahce.
+
+- **Shared memory (Memory component)** contains the following information (if found in line SHARED_MEMORY)
+    - size
+    - name = "Shared memory"
+
+Shared memory is usually a child of an SM, unless L2 cache is shared on SM level (then it is L2 cache)
+
+- **DataPath**
+    - **Load Latencies** are measured between the cores and several memories/cahces. They are oriented, DataPath type SYS_SAGE_DATAPATH_TYPE_LOGICAL. It contains the "Load_Latency" value from the particular entry. GPU cycles value is used (bool latency_in_cycles cannot be set up now).
+    The following DataPaths are created: 
+        - **Global Memory --> each GPU core** (class Memory --> Thread)
+        - **Shared memroy --> all GPU cores from the SM** (class Memory --> Thread)
+        - **L2 cache --> each GPU core** (class Cache --> Thread)
+        - **L1 cache --> all child GPU cores** (class Cache --> Thread)
+        - **Texture cache --> all GPU cores from the SM** (class Cache --> Thread) -- if Texture cache is shared with L1, this DP does not get created (//TODO create anyways?)
+        - **Read-only cache --> all GPU cores from the SM** (class Cache --> Thread) -- if Read-only cache is shared with L1 or Texture, this DP does not get created (//TODO create anyways?)
+        - **Constant L1 cache --> all GPU cores from the SM** (class Cache --> Thread) -- if Constant L1 cache is shared with L1, Texture, or Read-only, this DP does not get created (//TODO create anyways?)
+        - **Constant L1.5 cache --> all GPU cores from the SM** (class Cache --> Thread) 
+
+Line "REGISTER_INFORMATION" of the output is not parsed. (//TODO parse as well?)
+
+
diff --git a/docs/index.md b/docs/index.md
@@ -9,6 +9,7 @@ The main goal of the library is to to store, update, and provide all relevant in
 
 - [sys-sage Library Concept](Concept.md)
 - [Installation Guide](Installation_Guide.md)
+- [Data Parsers Documentation](Data_Parsers.md)
 - **API documentation**
     - [**Component**](class_component.html) ( [Topology](class_topology.html), [Node](class_node.html), [Memory](class_memory.html), [Storage](class_storage.html),  [Chip](class_chip.html), [Cache](class_cache.html), [Subdivision](class_subdivision.html), [Numa](class_numa.html), [Core](class_core.html), [Thread](class_thread.html) )
     - [**Data Path**](class_data_path.html)

diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt
@@ -11,13 +11,13 @@ endif()
 
 add_executable(basic_usage basic_usage.cpp)
 add_executable(custom_attributes custom_attributes.cpp)
-add_executable(gpu-topo-parser gpu-topo-parser.cpp)
+add_executable(mt4g-parser mt4g-parser.cpp)
 add_executable(larger_topo larger_topo.cpp)
 add_executable(sys-sage-benchmarking sys-sage-benchmarking.cpp)
 add_executable(use_custom_parser custom_parser_musa/use_custom_parser.cpp custom_parser_musa/musa_parser.cpp custom_parser_musa/musa_parser.hpp)
 add_executable(cccbenchplushwloc cccbenchplushwloc.cpp)
 
-install(TARGETS basic_usage gpu-topo-parser custom_attributes larger_topo sys-sage-benchmarking use_custom_parser cccbenchplushwloc DESTINATION bin/examples)
+install(TARGETS basic_usage mt4g-parser custom_attributes larger_topo sys-sage-benchmarking use_custom_parser cccbenchplushwloc DESTINATION bin/examples)
 install(DIRECTORY example_data DESTINATION bin/examples)
 
 if(CAT_AWARE)

diff --git a/examples/custom_parser_musa/musa_parser.cpp b/examples/custom_parser_musa/musa_parser.cpp
@@ -156,7 +156,7 @@ Memory* MusaParser::ParseMemory() {
     else if (output.find("tb") != std::string::npos)
         size *= (long long)1000*(long long)1000*(long long)1000*(long long)1000;
 
-    Memory* mem = new Memory(socket, input, size);
+    Memory* mem = new Memory(socket, 0, input, size);
 	return mem;
 }
 

diff --git a/examples/gpu-topo-parser.cpp → examples/mt4g-parser.cpp b/examples/gpu-topo-parser.cpp → examples/mt4g-parser.cpp
@@ -5,7 +5,7 @@
 
 void usage(char* argv0)
 {
-    std::cerr << "usage: " << argv0 << " <gpu-topo path>" << std::endl;
+    std::cerr << "usage: " << argv0 << " <mt4g output path>" << std::endl;
     std::cerr << "       or" << std::endl;
     std::cerr << "       " << argv0 << " (uses predefined paths which may be incorrect.)" << std::endl;
     return;
@@ -32,8 +32,8 @@ int main(int argc, char *argv[])
     Topology* topo = new Topology();
     Node* n = new Node(topo,1);
 
-    cout << "-- Parsing gpu-topo benchmark from file " << gpuTopoPath << endl;
-    if(parseGpuTopo((Component*)n, gpuTopoPath, 0, ";") != 0) { //adds topo to a next node
+    cout << "-- Parsing mt4g output from file " << gpuTopoPath << endl;
+    if(parseMt4gTopo((Component*)n, gpuTopoPath, 0, ";") != 0) { //adds topo to a next node
         return 1;
     }
     cout << "-- End parseGpuTopo" << endl;

diff --git a/examples/sys-sage-benchmarking.cpp b/examples/sys-sage-benchmarking.cpp
@@ -101,7 +101,7 @@ int main(int argc, char *argv[])
     //time mt4g parser
     Chip* gpu = new Chip(n, 100, "GPU");
     t_start = high_resolution_clock::now();
-    ret = parseGpuTopo(gpu, mt4gPath, ";");
+    ret = parseMt4gTopo(gpu, mt4gPath, ";");
     t_end = high_resolution_clock::now();
     uint64_t time_parseMt4g = t_end.time_since_epoch().count()-t_start.time_since_epoch().count()-timer_overhead;
     if(ret != 0){

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
@@ -12,7 +12,7 @@ set(SOURCES
     xml_dump.cpp
     parsers/hwloc.cpp
     parsers/caps-numa-benchmark.cpp
-    parsers/gpu-topo.cpp
+    parsers/mt4g.cpp
     parsers/cccbench.cpp
     )
 
@@ -24,7 +24,7 @@ set(HEADERS
     xml_dump.hpp
     parsers/hwloc.hpp
     parsers/caps-numa-benchmark.hpp
-    parsers/gpu-topo.hpp
+    parsers/mt4g.hpp
     parsers/cccbench.cpp
     )
 

diff --git a/src/Topology.cpp b/src/Topology.cpp
@@ -49,6 +49,68 @@ void Component::InsertChild(Component * child)
     child->SetParent(this);
     children.push_back(child);
 }
+int Component::InsertBetweenParentAndChild(Component* parent, Component* child, bool alreadyParentsChild)
+{
+    //consistency check
+    vector<Component*> * p_children = parent->GetChildren();
+    if(child->GetParent() != parent){
+        if(std::find(p_children->begin(), p_children->end(), child) != p_children->end())
+            return 1; //child and parent are not child and parent in the component tree
+        else
+            return 2; //corrupt component tree -> bad thing
+    }
+    else{
+        if(std::find(p_children->begin(), p_children->end(), child) == p_children->end())
+            return 3; //corrupt component tree -> bad thing
+    }
+
+    //remove from grandparent's list; set new parent; insert child into the new component's list
+    p_children->erase(std::remove(p_children->begin(), p_children->end(), child), p_children->end());
+    child->SetParent(this);
+    this->InsertChild(child);
+
+    //finally, insert new component to grandparent's children list
+    if(!alreadyParentsChild)
+    {
+        this->SetParent(parent);
+        parent->InsertChild(this);
+    }
+
+    return 0;
+}
+int Component::InsertBetweenParentAndChildren(Component* parent, vector<Component*> children, bool alreadyParentsChild)
+{
+    vector<Component*> * p_children = parent->GetChildren();
+    for(Component* child: children) //first just check for consistency
+    {
+        bool isParent = (child->GetParent() == parent);      
+        if(std::find(p_children->begin(), p_children->end(), child) == p_children->end()){  //child not listed as parent's child
+            if(isParent)
+                return 2; //corrupt component tree -> bad thing
+            else
+                return 1; // just entered a component in the list, which is not a child of the parent
+        }
+        if(!isParent)
+            return 3; //corrupt component tree -> bad thing
+    }
+
+    for(Component* child: children) //second time do the actual inserting
+    {
+        //remove from grandparent's list; set new parent; insert child into the new component's list
+        p_children->erase(std::remove(p_children->begin(), p_children->end(), child), p_children->end());
+        child->SetParent(this);
+        this->InsertChild(child);
+    }
+
+    //finally, insert new component to grandparent's children list
+    if(!alreadyParentsChild)
+    {
+        this->SetParent(parent);
+        parent->InsertChild(this);
+    }
+
+    return 0;
+}
 int Component::RemoveChild(Component * child)
 {
     int orig_size = children.size();
@@ -537,7 +599,7 @@ Node::Node(int _id, string _name):Component(_id, _name, SYS_SAGE_COMPONENT_NODE)
 Node::Node(Component * parent, int _id, string _name):Component(parent, _id, _name, SYS_SAGE_COMPONENT_NODE){}
 
 Memory::Memory():Component(0, "Memory", SYS_SAGE_COMPONENT_MEMORY){}
-Memory::Memory(Component * parent, string _name, long long _size):Component(parent, 0, _name, SYS_SAGE_COMPONENT_MEMORY), size(_size){}
+Memory::Memory(Component * parent, int _id, string _name, long long _size):Component(parent, _id, _name, SYS_SAGE_COMPONENT_MEMORY), size(_size){}
 
 Storage::Storage():Component(0, "Storage", SYS_SAGE_COMPONENT_STORAGE){}
 Storage::Storage(Component * parent):Component(parent, 0, "Storage", SYS_SAGE_COMPONENT_STORAGE){}