Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decimal logical type : TypeError: can only concatenate str (not "int") to str #43

Open
balaby25 opened this issue Nov 14, 2022 · 1 comment

Comments

@balaby25
Copy link

am trying to convert an existing csv to avro using pandavro.

am not able to resolve the below error:
File "fastavro/_logical_writers.pyx", line 130, in fastavro._logical_writers.prepare_bytes_decimal
File "fastavro/_logical_writers.pyx", line 143, in fastavro._logical_writers.prepare_bytes_decimal
TypeError: can only concatenate str (not "int") to str

i did check my csv, avsc and pandavro lines of code multiple times.. am not able to find what is the problem. am not savvy enough to call it a bug.
can anyone provide me with some pointers.

data in the csv : 999.879
the column p_cost in avsc:
{ "name": "p_cost", "type": {"name": "decimalEntry", "type": "bytes", "logicalType": "decimal", "precision": 15, "scale": 3} },
the lines of code. :

def convert_to_decimal(val):
"""
Convert the string number value to a Decimal
- Must set precision and scale beforehand
"""
return Decimal(val)

   schema_promotion = load_schema("promotion.avsc")
   df_promotion = pd.read_csv( '/scratch/tpcds_1/promotion/promotion.dat' , delimiter='|',header=None,usecols=[0,1,2,3,4,5,6,7,8,9,10,11,12

,13,14,15,16,17,18],names=['p_promo_sk',....,'p_cost',...,'p_discount_active']
,dtype={'p_cost': 'str'})

getcontext().prec = 15 # set precision of all future decimals
type(df_promotion['p_cost'])

df_promotion['p_cost'] = df_promotion['p_cost'].apply(convert_to_decimal)
pdx.to_avro('test_promotion.avro', df_promotion, schema=schema_promotion )

throws below error:

Traceback (most recent call last):
File "perfectlyrandom.py", line 313, in
promotion()
File "perfectlyrandom.py", line 262, in promotion
pdx.to_avro('test_promotion.avro', df_promotion, schema=schema_promotion )
File "/home/opc/.local/lib/python3.8/site-packages/pandavro/init.py", line 322, in to_avro
fastavro.writer(f, schema=schema,
File "fastavro/_write.pyx", line 727, in fastavro._write.writer
File "fastavro/_write.pyx", line 680, in fastavro._write.Writer.write
File "fastavro/_write.pyx", line 432, in fastavro._write.write_data
File "fastavro/_write.pyx", line 422, in fastavro._write.write_data
File "fastavro/_write.pyx", line 366, in fastavro._write.write_record
File "fastavro/_write.pyx", line 387, in fastavro._write.write_data
File "fastavro/_logical_writers.pyx", line 130, in fastavro._logical_writers.prepare_bytes_decimal
File "fastavro/_logical_writers.pyx", line 143, in fastavro._logical_writers.prepare_bytes_decimal
TypeError: can only concatenate str (not "int") to str

if full schema definition and pandas df definition is needed, i shall provide the same.
pip list:
avro-python3 1.10.2
fastavro 1.5.1
numpy 1.23.3
pandas 1.5.0
pandavro 1.7.1

@jbvaningen
Copy link

I'm not sure about the details of your code, but I do have a workaround suggestion:

You can avoid using pandavro by using fastavro (also used by pandavro itself) to write to Avro directly. You will have to figure out how to read and parse the CSV yourself. To me it looks like you can convert the CSV to JSON, and then use fastavro.json_read. You will also have to define the Avro schema, this is something that pandavro currently solves for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants