Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: tojson() tries to read stdin twice, returns empty results #668

Open
1 task done
yaniv-aknin opened this issue Apr 28, 2024 · 3 comments
Open
1 task done

Bug: tojson() tries to read stdin twice, returns empty results #668

yaniv-aknin opened this issue Apr 28, 2024 · 3 comments
Labels
Bug It must work in all situations, but this failed

Comments

@yaniv-aknin
Copy link

yaniv-aknin commented Apr 28, 2024

What happened?

When using the petl executable, tojson() from stdin returns empty results -

$ petl 'dummytable().head(3).tocsv()' | petl 'fromcsv().tojson()'
[]
$

This is also true for a trivial program, without the executable (see code for repro.py below) -

$ ./repro.py < repro.csv
[]
$

What is the expected behavior?

I'd expect some data.

For example, tocsv() doesn't exhibit this problem -

$ petl 'dummytable().head(3).tocsv()' | petl 'fromcsv().tocsv()'          
foo,bar,baz
82,bananas,0.7873787427711181
3,oranges,0.13771232086689877
13,pears,0.24287642641761387
$

Reproducible test case

This is repro.py. Passing CSV data on stdin will emit an empty JSON array.

#!/usr/bin/env python3

import petl

petl.fromcsv().tojson()

What version of petl are you have found the bug?

v.1.7.15

Version

python 3.12

What OS are you seeing the problem on?

MacOS

What OS version are you using?

No response

What package manager you used to install?

Other

What's the current installed packages?

No response

Relevant log output

No response

Additional Notes

I wasn't sure how to fix it, but I'm pretty sure the bug is that sys.stdin is read twice (this line in tojson() invokes CSVView.__iter__ twice).

The first read depletes the lines from stdin and incorrectly discards the results. I'll try to investigate this further and report here if I do, but I also wanted other folks to be aware of the bug.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@yaniv-aknin yaniv-aknin added the Bug It must work in all situations, but this failed label Apr 28, 2024
@juarezr
Copy link
Member

juarezr commented Apr 28, 2024

It looks like something got wrong with tocsv() because this pattern works in other similar functions :

❯ petl 'dummytable().head(3).tojson()' | petl 'fromjson(source=None)'
+-----+-----------+---------------------+
| foo | bar       | baz                 |
+=====+===========+=====================+
|  65 | 'pears'   |  0.9035174673053437 |
+-----+-----------+---------------------+
|  28 | 'bananas' | 0.38930455757975013 |
+-----+-----------+---------------------+
|  67 | 'oranges' |  0.4340697139584314 |
+-----+-----------+---------------------+

❯ petl 'dummytable().head(3).tojson()' | petl 'fromjson(source=None).tocsv()'
foo,bar,baz
5,bananas,0.6813039656995481
34,pears,0.36447357286299165
47,bananas,0.39808978898927116

@yaniv-aknin
Copy link
Author

yaniv-aknin commented Apr 28, 2024

I don't think so -

$ petl 'dummytable().topickle("test.pkl")'
$ petl 'frompickle().tojson()' < test.pkl 
[]
$

I think the problem is "tojson() reading from a table initialized from stdin", because tojson() iterates the underlying table twice and the underlying table doesn't persist what it read from stdin on the first time.

(none of your examples make tojson() read from something initialized from stdin, like fromcsv() or frompickle())

@juarezr
Copy link
Member

juarezr commented Apr 28, 2024

Certainly, we need to:

  • review if all functions work when called in the petl executable.
  • add some test cases to ensure that future changes will not break it accidentally.

Looking further I've quickly found what looks like another inconsistency:

❯ petl 'dummytable().head(3).tojson()' | petl 'fromjson(source=None).tohtml()'
<table class='petl'>
<thead>
<tr>
<th>foo</th>
<th>bar</th>
<th>baz</th>
</tr>
</thead>
<tbody>
<tr>
<td style='text-align: right'>23</td>
<td>oranges</td>
<td style='text-align: right'>0.5601641490162261</td>
</tr>
<tr>
<td style='text-align: right'>12</td>
<td>oranges</td>
<td style='text-align: right'>0.6160886095315175</td>
</tr>
<tr>
<td style='text-align: right'>23</td>
<td>pears</td>
<td style='text-align: right'>0.8090897047903948</td>
</tr>
</tbody>
</table>
❯ petl 'dummytable().head(3).tojson()' | petl 'fromjson(source=None).toxml()'
Traceback (most recent call last):
  File "/home/juarezr/.virtualenvs/py_petl_v175/bin/petl", line 25, in <module>
    r = eval(expression)
        ^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
AttributeError: 'JsonView' object has no attribute 'toxml'. Did you mean: 'tohtml'?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It must work in all situations, but this failed
Projects
None yet
Development

No branches or pull requests

2 participants