Replies: 2 comments 2 replies
-
Hi @Enucatl, The main issue that had been raised was a change to how Pandas handles the conversion of null data when converting to numpy (which we rely on for conversions from Pandas to PyKX Tables/Vectors). It's illustrated by the following difference in behaviour which trickles down to PyKX 2.1.4: >>> pd.Series([1, pd.NA, 3], dtype=pd.Int64Dtype()).to_numpy()
array([1, <NA>, 3], dtype=object) 2.2.0: >>> pd.Series([1, pd.NA, 3], dtype=pd.Int64Dtype()).to_numpy()
array([ 1., nan, 3.]) The result of this is that the data format that we get from PyKX won't round-trip appropriately. We're hesitant to force the data conversion |
Beta Was this translation helpful? Give feedback.
-
That does work (python 3.12, pandas 2.2.2, pykx 2.5.2): import pykx
import pandas as pd
# Create a simple table using PyKX
kx_table = pykx.Table(
data={
"date": pykx.q("2023.01.01 2023.01.02 2023.01.03"),
"sym": ["AAPL", "GOOGL", "MSFT"],
"price": [150.25, 2800.75, 310.50],
"volume": [1000000, 500000, 750000],
}
)
print("pandas version", pd.__version__)
print("pykx version", pykx.__version__)
print()
print("KX Table:")
print(kx_table)
# Convert the KX table directly to a PyArrow Table
arrow_table = kx_table.pa()
pandas_table = arrow_table.to_pandas(types_mapper=pd.ArrowDtype)
print()
print(pandas_table.info()) $ /opt/home/user/venv/pykx/bin/python test.py
pandas version 2.2.2
pykx version 2.5.2
KX Table:
date sym price volume
--------------------------------
2023.01.01 AAPL 150.25 1000000
2023.01.02 GOOGL 2800.75 500000
2023.01.03 MSFT 310.5 750000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 3 non-null timestamp[s][pyarrow]
1 sym 3 non-null string[pyarrow]
2 price 3 non-null double[pyarrow]
3 volume 3 non-null int64[pyarrow]
dtypes: double[pyarrow](1), int64[pyarrow](1), string[pyarrow](1), timestamp[s][pyarrow](1)
memory usage: 229.0 bytes
None |
Beta Was this translation helpful? Give feedback.
-
I read on the docs (and in the code) that pandas 2.2 is not supported yet. Are there any details on what is the types problem that is mentioned? Should it be an easy update for the near future?
Beta Was this translation helpful? Give feedback.
All reactions