Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas support #34

Open
shayandavoodii opened this issue Aug 30, 2022 · 6 comments
Open

Pandas support #34

shayandavoodii opened this issue Aug 30, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@shayandavoodii
Copy link

I'm trying to load a pickle file using Pickle.jl using load function:

julia> bud = load(open("MV-Budget.pkl"))

But it leads to an error:

ERROR: AssertionError: Imcompatible protocol version:
    Trying to load version 5 pickle file with version 4 pickler.
    Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.
    If that still failed, please open an issue.

Stacktrace:
 [1] execute!(p::Pickler{4}, #unused#::Val{Pickle.OpCodes.PROTO}, arg::UInt8)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:178
 [2] run!(p::Pickler{4}, op::Pickle.OpCodes.OpCode, io::IOStream)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:25
 [3] load(p::Pickler{4}, io::IOStream)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:15
 [4] load(io::IOStream; proto::Int64)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:10
 [5] load(io::IOStream)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:10
 [6] top-level scope
   @ REPL[6]:1
@chengchingwen
Copy link
Owner

  Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.

What is the error log of load("MV-Budget.pkl"; proto = 5)?

@shayandavoodii
Copy link
Author

  Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.

What is the error log of load("MV-Budget.pkl"; proto = 5)?

That will lead to this:

Defer(:build, Defer(:newobj, Defer(:pandas.core.frame.DataFrame)), Dict{Any, Any}("_mgr" => Defer(:reduce, Defer(:pandas.core.internals.managers.BlockManager), (Defer(:reduce, Defer(:pandas._libs.internals._unpickle_block), Defer(:reduce, Defer(:numpy.core.numeric._frombuffer), UInt8[0x02, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x04, 0x00  …  0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00], Defer(:build, Defer(:reduce, Defer(:numpy.dtype), i4, false, true), (3, "<", nothing, nothing, nothing, -1, -1, 0)), (17, 24), F), Defer(:reduce, Defer(:builtins.slice), 0, 17, 1), 2),), Any[Defer(:reduce, Defer(:pandas.core.indexes.base._new_Index), Defer(:pandas.core.indexes.base.Index), Dict{Any, Any}("name" => nothing, "data" => Defer(:build, Defer(:reduce, Defer(:numpy.core.multiarray._reconstruct), Defer(:numpy.ndarray), (0,), UInt8[0x62]), (1, (17,), Defer(:build, Defer(:reduce, Defer(:numpy.dtype), O8, false, true), (3, "|", nothing, nothing, nothing, -1, -1, 63)), false, Any["return5", "return10", "return25", "std5", "std10", "std25", "corr5", "corr10", "corr25", "diff_close5", "diff_close10", "diff_close25", "pred_ret5", "pred_vol5", "pred_cor5", "pred_cor10", "pred_cor25"])))), Defer(:reduce, Defer(:pandas.core.indexes.base._new_Index), Defer(:pandas.core.indexes.base.Index), Dict{Any, Any}("name" => nothing, "data" => Defer(:build, Defer(:reduce, Defer(:numpy.core.multiarray._reconstruct), Defer(:numpy.ndarray), (0,), UInt8[0x62]), (1, (24,), Defer(:build, Defer(:reduce, Defer(:numpy.dtype), O8, false, true), (3, "|", nothing, nothing, nothing, -1, -1, 63)), false, Any["MSFT", "PEP", "TSLA", "AMZN", "LKQ", "ABMD", "MSI", "PH", "NKE", "TM"  …  "EQIX", "EA", "AAP", "TEL", "DG", "EXR", "MDLZ", "FIS", "CRL", "RCL"]))))]), "_metadata" => Any[], "attrs" => Dict{Any, Any}(), "_typ" => "dataframe", "_flags" => Dict{Any, Any}("allows_duplicate_labels" => true)))

But, I get this output if I use Pandas.jl for reading the pickle file:

julia> df = read_pickle("MV-Budget.pkl")
       return5  return10  return25  std5  std10  ...  pred_ret5  pred_vol5  pred_cor5  pred_cor10  pred_cor25
MSFT         2         1         4     4      4  ...          5          7          3           3           2
PEP          8         8         7     7      9  ...          4         10          4           4           4
TSLA         1         5         6     2      2  ...          1          2          9           9           9
AMZN         4         4         5     1      1  ...          4          5          5           6           7
LKQ          4         4         3    10     10  ...          7          3         10          10          10
ABMD         7         6         1     5      5  ...          6          1          7           8           8
MSI          6         6         8     6      7  ...          1          9          5           5           5
PH           2         3         4     3      3  ...         10          5          1           1           1
NKE          3         2         3     7      6  ...         10          4          2           2           3
TM          10         7         6     9      8  ...          7         10          8           7           7
EOG          1         1         1     6      6  ...         10          6         10          10          10
GOOGL        3         3         4     1      1  ...          8          7          1           3           4
NFLX        10        10        10     4      4  ...          5          1          4           2           3
GS           4         2         2     3      3  ...          2          4          2           1           1
EQIX         7         9         9     1      1  ...          1          7          6           6           5
EA           9         8         8    10     10  ...          8          3          6           5           6
AAP          9        10         9     9      8  ...          9          2         10          10          10
TEL          5         4         2     8      7  ...          9         10          1           1           1
DG          10        10        10     7      9  ...          2          1          4           4           4
EXR          8         5         5     8      7  ...          6          8          9           9           8
MDLZ         7         7         7    10     10  ...          7          9          7           8           9
FIS          5         9         7     4      5  ...          4          8          3           4           2
CRL          6         7        10     5      4  ...          3          6          7           7           6
RCL          1         1         1     2      2  ...          3          4          8           7           7

[24 rows x 17 columns]

@chengchingwen chengchingwen changed the title ERROR: AssertionError: Imcompatible protocol version Pandas support Aug 30, 2022
@chengchingwen chengchingwen added the enhancement New feature or request label Aug 30, 2022
@chengchingwen
Copy link
Owner

That will lead to this:

That is a result indicating that there are some stuff unknown to Pickle.jl. Pandas.jl don't have issues because they call python directly. In order to make that work, we need to add the corresponding method mapping for each method you seen in the Defer object.

@zsz00
Copy link

zsz00 commented Sep 22, 2022

see #25

@DarioSlaifsteinSk
Copy link

So... is there any way to transform from Defer to DataFrame? Or to build the DataFrame from the Defer obj?

@chengchingwen
Copy link
Owner

build the DataFrame from the Defer obj?

It's definitely doable. This is how we support a new python object with Pickle.jl, but we need someone to actually implement that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants