MAG Import

Sub Projects

magimport-parse

A generic parser for Microsoft Academic Graph. You have to supply a callback class to process the parsed structures.
Also contains a tool to create a subgraph of MAG by filtering certain tables.

Important classes

All of the following classes are in package org.gradoop.examples.io.mag.magimport.

data.MagObject: An object containing a parsed structure from the graph. It can be an edge or a node. All attributes are stored in a string array.
data.TableSchema: Describes the structure of one MAG TSV file. Contains a list of all columns and column types. Can be created via data.TableSchema.Builder.
callback.ElementProcessor: The callback interface for processing parsed data.

magimport-gradoop

An implementation of the magimport-parse callback interface. Parses the input files via magimport-parse and saves the results in the Gradoop JSON file format.

Important classes

All of the following classes are in package org.gradoop.examples.io.mag.magimport.gradoop.

GradoopElementProcessor: Implementation of the magimport-parse callback interface.

magimport-grouping

Loads a Gradoop logical graph from the Gradoop JSON file format and uses the groupBy-Operator on it. The attributes of edges and nodes each get count aggregated. The resulting graph is written in the DOT format.

Usage

Step 0: Create a subgraph (OPTIONAL)

java org.gradoop.examples.io.mag.magimport.tools.subgraph.SubgraphCreator <input path> <output path> <filter>

The input path should be the path containing the MAG TSV files.

The output path must be a directory to store the filtered MAG TSV files in.

The filter is a string searched in the Affiliations table (case insensitive).

Step 1: Parse and import graph

java org.gradoop.examples.io.mag.magimport.gradoop.ImportMain <input path> <output path> [<parse limit>]

The input path should be the path containing the extracted MAG TSV files.

The output path must be a directory and will contain the JSON output files. The directory will be created if it doesn't already exist.

The parse limit limits how many lines per TSV File get parsed. This value is limited by available memory. The default value is 20000.

Step 2: Group graph

java org.gradoop.examples.io.mag.magimport.grouping.GroupingMain <input path> <output path>

The input path should be the output path from step 1 (containing the JSON Files).

The output path must be a directory and will contain the DOT output files. The directory will be created if it doesn't already exist.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
example-dataset		example-dataset
magimport-gradoop		magimport-gradoop
magimport-grouping		magimport-grouping
magimport-parse-flink		magimport-parse-flink
magimport-parse		magimport-parse
.gitignore		.gitignore
Documentation.pdf		Documentation.pdf
LICENSE		LICENSE
README.md		README.md
license-header.txt		license-header.txt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

MAG Import

Sub Projects

magimport-parse

Important classes

magimport-gradoop

Important classes

magimport-grouping

Usage

Step 0: Create a subgraph (OPTIONAL)

Step 1: Parse and import graph

Step 2: Group graph

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

ScaDS/msagimport

Folders and files

Latest commit

History

Repository files navigation

MAG Import

Sub Projects

magimport-parse

Important classes

magimport-gradoop

Important classes

magimport-grouping

Usage

Step 0: Create a subgraph (OPTIONAL)

Step 1: Parse and import graph

Step 2: Group graph

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages