Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why iterrowmapmany convert each row to Record instance ? #623

Open
jossefaz opened this issue Jun 21, 2022 · 2 comments
Open

Why iterrowmapmany convert each row to Record instance ? #623

jossefaz opened this issue Jun 21, 2022 · 2 comments

Comments

@jossefaz
Copy link

jossefaz commented Jun 21, 2022

In this method :

def iterrowmapmany(source, rowgenerator, header, failonerror):

Each row is converted to a Record instance.

it = (Record(row, flds) for row in it)

In my usecase, my "rowgenerator" helper function function do need a named tuple and not a plain row as an input. This is a great convenience to call named attribute instead of unclear row[5] - "index notation".
For that purpose I tried to use rowmapmany in this way :

etl.rowmapmany(etl.namedtuples(my_table), rowgenerator=mapper, header=headers)

I thought that using namedtuples will solve my issue (because my row has more than 100 columns, so it is a bit hard to use indexes i.e row[57] where a named tuple could simply gives me the convenience of row.my_target_attribute.

But because of this conversion to Record instance, the input will convert each namedtuple to a plain list of values which is a bit frustrating, since it forces us to use the indexes notation in the mapper function (very hard to read).

When I remove this line

it = (Record(row, flds) for row in it)

It works like a charm....
Why this Record conversion is important ?
If it is not, could we remove it from the iterrowmapmany method ?

Please help 🙏

@jossefaz
Copy link
Author

jossefaz commented Jun 21, 2022

Another reason to not convert to a Record : using nameduple as input for the rowmapper, unleash us from any order binding... accessing property in the mapper will be by name and not by position.

So no matter what are the order of the field in the input source, the mapper will work as expected, even if the field order changed between two input that have the same output target.

@bmaggard
Copy link
Contributor

https://petl.readthedocs.io/en/latest/util.html#petl.util.base.records

"a record is a hybrid object supporting all possible ways of accessing values."

The examples for rowmapmany demonstrate this:

https://petl.readthedocs.io/en/latest/transform.html#petl.transform.maps.rowmapmany
`

def rowgenerator(row):
... transmf = {'male': 'M', 'female': 'F'}
... yield [row[0], 'gender',
... transmf[row['sex']] if row['sex'] in transmf else None]
... yield [row[0], 'age_months', row.age * 12]
... yield [row[0], 'bmi', row.height / row.weight ** 2]
...
table2 = etl.rowmapmany(table1, rowgenerator,
... header=['subject_id', 'variable', 'value'])

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants