Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrowIndexError when converting PyArrow Table with nested null values to Pandas DataFrame #44881

Open
emanueledomingo opened this issue Nov 28, 2024 · 0 comments

Comments

@emanueledomingo
Copy link

emanueledomingo commented Nov 28, 2024

Describe the bug, including details regarding any error messages, version, and platform.

The code below produces an ArrowIndexError when attempting to convert a PyArrow Table with nested null values into a Pandas DataFram. Everything works fine if i use to_pydict instead.

import pyarrow as pa
import pandas as pd
import json

json_data = json.loads("""
[
    {
        "key1": [
            {
                "key2": null
            },
            {
                "key2": null
            }
        ]
    }
]
""")

pa_table = pa.Table.from_pylist(json_data)
my_dict = pa_table.to_pydict()  # This works as expected: {'key1': [[{'key2': None}, {'key2': None}]]}
df = pa_table.to_pandas(types_mapper=pd.ArrowDtype).to_dict()

I'm using pyarrow 17 (pandas 2.2.3)

Component(s)

Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant