Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in tutorial : inconsistent TemporalNetwork.network type? #59

Open
maximelucas opened this issue Mar 6, 2020 · 4 comments
Open

Comments

@maximelucas
Copy link
Contributor

Copy pasting the first few lines of the tutorial
https://teneto.readthedocs.io/en/latest/tutorial/networkrepresentation.html#temporalnetwork-object
yields an error at tnet.network.head() because tnet created with
tnet.generatenetwork('rand_binomial',size=(5,3), prob=0.5)
is a Numpy ndarray and not a Pandas DataFrame.

I haven't tried other ways of creating tnet, so I'm not sure if tnet.network is consistently a numpy array or not. If so, the tutorial should be updated, otherwise it should be made so that tnet.network is always the same type: either numpy array or pandas DataFrame.

@wiheto
Copy link
Owner

wiheto commented Mar 6, 2020

Nice catch! This is just an outdated tutorial. Thanks for flagging this.

The issue here is that modifications were made to only use pandas arrays when the network/connections are sparse (I think if less than 25% of the connections are non-zero). This random network fills that criteria so it makes it a numpy array.

So I am creating a little to do list for myself here:

  • Update the tutorial.
  • Add code in tutorial to tests that are performed so this doesn't happen again

@maximelucas
Copy link
Contributor Author

maximelucas commented Mar 6, 2020

That makes sense! Setting prob=0.2 yields a Pandas DataFrame. Thanks for the explanation.

I'm guessing you made that choice to optimise the code in some way.
I'm just thinking, if I write some general code, and then run it on some different datasets, some might be a bit more or a bit less sparse than your threshold. Then, the same code might run or not run, or just yield different results, because of this. Does this not have the potential to create some trouble? Of course I guess I could just check the data and adapt the code every time.

@wiheto
Copy link
Owner

wiheto commented Mar 6, 2020

Does this not have the potential to create some trouble?

It is a good point.

  1. All teneto functions are agnostic to whether TemporalNetwork.network is a dataframe or numpy array (and most functions can take HDF dataframes as well (some parts of this are still being optimized though)).

  2. If the user want's to always have dataframe there is an argument when defining TemporalNetwork: forcesparse, which, if True, will make sure the TemporalNetwork.network be a dataframe. I have planned to make this "force_df" and "force_array" but that has been low down on the priority list (or representation=['auto','df','array'])

This behaviour is the an unfortunate trade-off I've had to make for wanting to be quick on small datasets and be able to cope with very large datasets. Trying to find one representation that is (1) familiar for a user to interact with (np or pd), (2) efficient for quick processing, (3) efficient for large networks, is something I hope to implement at one point. But for now, this was the best solution I could think of..

@maximelucas
Copy link
Contributor Author

I sensed it would be something like this, it makes sense, thanks. Your point 1. should make it quite safe. In any case I don't have a better solution either, I jut got surprised because I even prob=0.3 yielded sometimes sparse and sometimes dense nets!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants