Replies: 3 comments 4 replies
-
Hello, There are multiple types of datasets (e.g., Dataset, FileDataset, TensorDataset, CSVDataset), and all of them have their own structure. Despite their different structure, they can all be grounded and turned into an instance of BuiltDataset (dataset with the samples field), which is then used for evaluation and training. I assume the problem here is that you didn't build the dataset. All types of datasets can also be turned into Dataset (dataset with valued logic fact examples) by calling the But I would suggest avoiding turning TensorDataset into Dataset, as it might be slow, considering PPI is quite large. Instead, it might be a good idea to call Or you can call |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late response, you were right, I didn't ground it. But I also realized there were no relations defined in the dataset. Thank you in advance! |
Beta Was this translation helpful? Give feedback.
-
@erhc Hi, PyNeuraLogic supports loading datasets from databases - so even relational fit. I would recommend loading it and dumping it into files and then using FileDataset: import mariadb
from neuralogic.dataset import DBDataset, DBSource
conn = mariadb.connect(
user="guest",
password="relational",
host="relational.fit.cvut.cz",
port=3306,
database="Toxicology"
)
# relation name, table name, which columns are mapped to terms, which column is mapped to value
molecule = DBSource("molecule", "molecule", ["molecule_id"], "label", value_mapper=lambda x: -1 if x == "-" else 1)
atom = DBSource("atom", "atom", ["atom_id", "molecule_id", "element"])
bond = DBSource("bond", "bond", ["bond_id", "molecule_id", "bond_type"])
connected = DBSource("connected", "connected", ["atom_id", "atom_id2", "bond_id"])
dataset = DBDataset(
conn,
[atom, bond, connected], # Example sources
molecule, # Query source
)
logic_dataset = dataset.to_dataset()
logic_dataset.dump_to_file("queries.txt", "examples.txt")
You should also install the latest version of PyNeuraLogic (which I just released). It fixes an issue where some capitalized constants fetched from DB were not lowercase - changing their meaning to variables. Regarding TUDataset, you could load them with PyG and then create the PyNeuraLogic dataset from the PyG dataset. from torch_geometric.datasets import TUDataset
from neuralogic.dataset import TensorDataset, Data
ds = TUDataset(name=...)
dataset = TensorDataset(data=[Data.from_pyg(data)[0] for data in ds], number_of_classes=...) Also, from looking at the format of the datasets you posted, they are just a bunch of CSV files. PyNeuraLogic has a CSVDataset (in fact, DBDataset is using it in the background), but I don't think it is usable in this case. You can open all files and read them line by line (one line = one relation in the dataset). E.g., examples = []
with open("BZR_A.txt", mode="r") as fp:
for line in fp.readlines():
terms = line.split(",")
examples.append(R.edge(terms[0].strip(), terms[1].strip()))
# load other files
dataset = Dataset(examples, queries) |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to import some datasets that are available in pytorch geometric, you can see my code below.
I realized when building these datasets, they have a different structure than for example Mutagenesis dataset that is available in your framework. For example, the data is stored in ds.data for PPI, but in ds.samples for Mutag.
Is the structure supposed to be different for each dataset, or is there some common structure that each dataset should follow and how to get it in this case?
Also, how can I extract the relations from the dataset? Because in this form, it can never be grounded. The data in PPI has some x, y and edge_index values.
Thank you in advance!
Beta Was this translation helpful? Give feedback.
All reactions