Parses Microsoft Excel files(xls, xlsx) read by other file input plugins.
This plugin uses Apache POI.
- Plugin type: parser
- Guess supported: no
- Embulk 0.10 or later
in:
type: any file input plugin type
parser:
type: poi_excel
sheets: ["DQ10-orb"]
skip_header_lines: 1 # first row is header.
columns:
- {name: row, type: long, value: row_number}
- {name: get_date, type: timestamp, cell_column: A, value: cell_value}
- {name: orb_type, type: string}
- {name: orb_name, type: string}
- {name: orb_shape, type: long}
- {name: drop_monster_name, type: string}
if omit value, specified cell_value
.
if omit cell_column when value is cell_value
, specified next column.
- sheets: sheet name. can use wildcards
*
,?
. (list of string, required) - record_type: record type. (
row
,column
orsheet
. default:row
) - skip_header_lines: skip rows when record_type=
row
(skip columns when record_type=column
). ignored when record_type=sheet
. (integer, default:0
) - columns: column definition. see below. (hash, required)
- sheet_options: sheet option. see below. (hash, default: null)
- name: Embulk column name. (string, required)
- type: Embulk column type. (string, required)
- value: value type. see below. (string, default:
cell_value
) - column_number: same as cell_column.
- cell_column: Excel column number. see below. (string, default: next column when record_type=
row
) - cell_row: Excel row number. see below. (integer, default: next row when record_type=
column
) - cell_address: Excel cell address such as
A1
,Sheet1!B3
. (string, not required) - numeric_format: format of numeric(double) to string such as
%4.2f
. (default: Java's Double.toString()) - attribute_name: use with value
cell_style
,cell_font
, etc. see below. (list of string) - on_cell_error: processing method of Cell error. see below. (string, default:
constant
) - formula_handling: processing method of formula. see below. (
evaluate
orcashed_value
. default:evaluate
) - on_evaluate_error: processing method of evaluate formula error. see below. (string, default:
exception
) - formula_replace: replace formula before evaluate. see below.
- on_convert_error: processing method of convert error. see below. (string, default:
exception
) - search_merged_cell: search merged cell when cell is BLANK. (
none
,linear_search
,tree_search
orhash_search
, default:hash_search
)
-
cell_value
: value in cell. -
cell_formula
: formula in cell. (if cell is not formula, samecell_value
.) -
cell_style
: all cell style attributes. returned json string. see attribute_name. (type requiredstring
) -
cell_font
: all cell font attributes. returned json string. see attribute_name. (type requiredstring
) -
cell_comment
: all cell comment attributes. returned json string. see attribute_name. (type requiredstring
) -
cell_type
: cell type. returned Cell.getCellType() of POI. -
cell_cached_type
: cell cached formula result type. returned Cell.getCachedFormulaResultType() of POI when CellType==FORMULA, otherwise same ascell_type
(returned Cell.getCellType()). -
file_name
: excel file name. -
sheet_name
: sheet name. -
row_number
: row number(1 origin). -
column_number
: column number(1 origin). -
constant
: constant value.constant.
value: specified value.constant
: null.
Basically used for record_type=row
.
A
,B
,C
,...: column number of "A1 format".- number: column number (1 origin).
+
: next column.+
name: next column of name.+
number: number next column.-
: previous column.-
name: previous column of name.-
number: number previous column.=
: same column.=
name: same column of name.
Basically used for record_type=column
.
- number: row number (1 origin).
When value is cell_style
, cell_font
, or cell_comment
, by default, it retrieves all attributes and converts them into a JSON string.
(Since it returns a JSON string, the type must be string
.)
columns:
- {name: foo, type: string, cell_column: A, value: cell_style}
By specifying the attribute_name, it retrieves only the specified attributes and converts them into a JSON string.
- attribute_name: attribute names. (list of string)
columns:
- {name: foo, type: string, cell_column: A, value: cell_style, attribute_name: [border_top, border_bottom, border_left, border_right]}
Additionally, by appending a period after cell_style
or cell_font
and specifying the attribute name, you can retrieve only that attribute.
In this case, it won't result in a JSON string, and you need to specify the type that matches the attribute's type.
columns:
- {name: foo, type: long, value: cell_style.border}
- {name: bar, type: long, value: cell_font.color}
In cell_style
and cell_font
, if cell_column is omitted, it targets the same column as the previous one.
(In cell_value
, omitting cell_column
causes it to move to the next column.)
Processing method of Cell error (#DIV/0!
, #REF!
, etc).
columns:
- {name: foo, type: string, cell_column: A, value: cell_value, on_cell_error: error_code}
constant
: set null. (default)constant.
value: set specified value.error_code
: set error code.exception
: throw exception.
Processing method of formula.
columns:
- {name: foo, type: string, cell_column: A, value: cell_value, formula_handling: cashed_value}
evaluate
: evaluate formula. (default)cashed_value
: cashed value in cell.
Processing method of evaluate formula error.
columns:
- {name: foo, type: string, cell_column: A, value: cell_value, on_evaluate_error: constant}
constant
: set null.constant.
value: set specified value.exception
: throw exception. (default)
Replace formula before evaluate.
columns:
- {name: foo, type: string, cell_column: A, value: cell_value, formula_replace: [{regex: aaa, to: "A${row}"}, {regex: bbb, to: "B${row}"}]}
${row}
is replaced with the current row number.
${column}
is replaced with the current column string.
Processing method of convert error. ex) Excel boolean to Embulk timestamp
columns:
- {name: foo, type: timestamp, format: "%Y/%m/%d", cell_column: A, value: cell_value, on_convert_error: constant.9999/12/31}
constant
: set null.constant.
value: set specified value.exception
: throw exception. (default)
Options of individual sheet.
parser:
type: poi_excel
sheets: [Sheet1, Sheet2]
columns:
- {name: date, type: timestamp, cell_column: A}
- {name: foo, type: string}
- {name: bar, type: long}
sheet_options:
Sheet1:
skip_header_lines: 1
columns:
foo: {cell_column: B}
bar: {cell_column: C}
Sheet2:
skip_header_lines: 0
columns:
foo: {cell_column: D}
bar: {value: constant.0}
sheet_options is map of sheet name.
Map values are skip_header_lines, columns.
columns is map of column name.
Map values are same columns in parser (excluding name
, type
).
-
install plugin
$ mvn dependency:get -Dartifact=io.github.hishidama.embulk:embulk-parser-excel-poi:0.2.0
-
add setting to $HOME/.embulk/embulk.properties
plugins.parser.poi_excel=maven:io.github.hishidama.embulk:excel-poi:0.2.0
$ ./gradlew test
$ ./gradlew package
./gradlew generatePomFileForMavenJavaPublication
mvn install -f build/publications/mavenJava/pom-default.xml
./gradlew publishToMavenLocal