A File Output Plugin for Embulk to write HDFS.
- Plugin type: file output
- Load all or nothing: yes
- Resume supported: no
- Cleanup supported: no
- config_files list of paths to Hadoop's configuration files (array of strings, default:
[]
) - config overwrites configuration parameters (hash, default:
{}
) - path_prefix prefix of target files (string, required)
- file_ext suffix of target files (string, required)
- sequence_format format for sequence part of target files (string, default:
'%03d.%02d.'
) - rewind_seconds When you use Date format in path_prefix property(like
/tmp/embulk/%Y-%m-%d/out
), the format is interpreted by using the time which is Now minus this property. (int, default:0
) - overwrite overwrite files when the same filenames already exists (boolean, default:
false
)- caution: even if this property is
true
, this does not mean ensuring the idempotence. if you want to ensure the idempotence, you need the procedures to remove output files after or before running.
- caution: even if this property is
- doas username which access to Hdfs (string, default: executed user)
- delete_in_advance delete files and directories having
path_prefix
in advance (enum, default:NONE
)NONE
: do nothingFILE_ONLY
: delete filesRECURSIVE
: delete files and directories
If you use hadoop
user (hdfs admin user) as doas
, and if delete_in_advance
is RECURSIVE
,
embulk-output-hdfs
can delete any files and directories you indicate as path_prefix
,
this means embulk-output-hdfs
can destroy your hdfs.
So, please be careful when you use delete_in_advance
option and doas
option ...
out:
type: hdfs
config_files:
- /etc/hadoop/conf/core-site.xml
- /etc/hadoop/conf/hdfs-site.xml
config:
fs.defaultFS: 'hdfs://hdp-nn1:8020'
fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem'
fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem'
path_prefix: '/tmp/embulk/hdfs_output/%Y-%m-%d/out'
file_ext: 'txt'
overwrite: true
formatter:
type: csv
encoding: UTF-8
$ ./gradlew gem
$ ./gradlew classpath
$ bundle exec embulk run -I lib example.yml