A generic parser for Microsoft Academic Graph. You have to supply a callback class to process the parsed structures.
Also contains a tool to create a subgraph of MAG by filtering certain tables.
All of the following classes are in package org.gradoop.examples.io.mag.magimport
.
data.MagObject
: An object containing a parsed structure from the graph. It can be an edge or a node. All attributes are stored in a string array.data.TableSchema
: Describes the structure of one MAG TSV file. Contains a list of all columns and column types. Can be created viadata.TableSchema.Builder
.callback.ElementProcessor
: The callback interface for processing parsed data.
An implementation of the magimport-parse
callback interface. Parses the input files via magimport-parse
and saves the results in the Gradoop JSON file format.
All of the following classes are in package org.gradoop.examples.io.mag.magimport.gradoop
.
GradoopElementProcessor
: Implementation of themagimport-parse
callback interface.
Loads a Gradoop logical graph from the Gradoop JSON file format and uses the groupBy
-Operator on it. The attributes of edges and nodes each get count aggregated. The resulting graph is written in the DOT format.
java org.gradoop.examples.io.mag.magimport.tools.subgraph.SubgraphCreator <input path> <output path> <filter>
The input path
should be the path containing the MAG TSV files.
The output path
must be a directory to store the filtered MAG TSV files in.
The filter
is a string searched in the Affiliations
table (case insensitive).
java org.gradoop.examples.io.mag.magimport.gradoop.ImportMain <input path> <output path> [<parse limit>]
The input path
should be the path containing the extracted MAG TSV files.
The output path
must be a directory and will contain the JSON output files. The directory will be created if it doesn't already exist.
The parse limit
limits how many lines per TSV File get parsed. This value is limited by available memory. The default value is 20000.
java org.gradoop.examples.io.mag.magimport.grouping.GroupingMain <input path> <output path>
The input path
should be the output path from step 1 (containing the JSON Files).
The output path
must be a directory and will contain the DOT output files. The directory will be created if it doesn't already exist.