有个任务需要通过pandas向oracle写数据

最开始没加dtype,发现to_sql很慢,几百条数据都要十多秒;而且有时候会有如下莫名其妙的报错,但仔细检查数据发现数据是没问题的

  sqlalchemy_exception, with_traceback=exc_info[2], from_=e
	  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
	    raise exception
	  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1752, in _execute_context
	    cursor, statement, parameters, context
	  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/oracle/cx_oracle.py", line 1347, in do_executemany
	    cursor.executemany(statement, parameters)
	sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00932: inconsistent datatypes: expected DATE got CLOB

后面加上 to_sql 中加上 dtype 参数后,就快非常快了,上万条数据不到1s,而且同样的数据也没报错了。


解决问题时,发现几篇相关文章,记录下

  1. Pandas to_sql详解
    https://blog.csdn.net/weixin_44278512/article/details/102959370

  2. 嫌pandas的to_sql方法过慢?神方法让你一分钟搞定
    https://blog.csdn.net/chenKFKevin/article/details/72911525

import cStringIO
 
output = cStringIO.StringIO()
# ignore the index
df_a.to_csv(output, sep='\t',index = False, header = False)
output.getvalue()
# jump to start of stream
output.seek(0)
 
connection = engine.raw_connection() #engine 是 from sqlalchemy import create_engine
cursor = connection.cursor()
# null value become ''
cursor.copy_from(output,table_name,null='')
connection.commit()
cursor.close()

这个好像是专门针对postgresql 的
https://gist.github.com/catawbasam/3164289

  1. 关于pandas中to_sql性能太慢的优化
    https://blog.csdn.net/weixin_44313745/article/details/126994182
Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐