Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.io.IOException: No available bytes in the input stream #53

Open
wessankey opened this issue Jul 26, 2019 · 7 comments
Open

java.io.IOException: No available bytes in the input stream #53

wessankey opened this issue Jul 26, 2019 · 7 comments

Comments

@wessankey
Copy link

wessankey commented Jul 26, 2019

I'm running into the following exception when attempting to process a file:

java.io.IOException: There are no available bytes in the input stream.
  at com.epam.parso.impl.SasFileParser.getBytesFromFile(SasFileParser.java:768)
  at com.epam.parso.impl.SasFileParser.readSubheaderSignature(SasFileParser.java:423)
  at com.epam.parso.impl.SasFileParser.processPageMetadata(SasFileParser.java:392)
  at com.epam.parso.impl.SasFileParser.processNextPage(SasFileParser.java:591)
  at com.epam.parso.impl.SasFileParser.readNextPage(SasFileParser.java:561)
  at com.epam.parso.impl.SasFileParser.readNext(SasFileParser.java:519)
  at com.epam.parso.impl.SasFileReaderImpl.readNext(SasFileReaderImpl.java:168)
  ... 57 elided

The error occurs after processing approximately 400,000 rows, and the file has several million. My code is below:

import java.io.FileInputStream
import com.epam.parso.impl.SasFileReaderImpl

val sasFileReader = new SasFileReaderImpl(new FileInputStream("test.sas7bdat"))
int numRows = sasFileReader.getSasFileProperties().getRowCount()
int currentRowNum = 0

while (currentRowNum < numRows) {
    val currentRow = sasFileReader.readNext()
    currentRow.foreach(c => print(c + "|"))
    currentRowNum += 1
}

Environment details:
I'm running this on an EMR cluster with Scala 2.11.

@Yana-Guseva
Copy link
Collaborator

Hi @westonsankey, thank you for reporting this. Is there any way to provide us the source file? Thanks.

@wessankey
Copy link
Author

@Yana-Guseva - I am unable to provide the SAS source file.

I created an equivalent program in Java and got the same exception.

@Yana-Guseva
Copy link
Collaborator

Please tell me which version of Parso are you using? Do you know whether this file was created using the SAS platform and that there are definitely no errors in it?

It seems that this file contains an offset value that goes beyond the page boundaries for one of subheaders.

@wessankey
Copy link
Author

I've tested using versions 2.0.9, 2.0.10, and 2.0.11. I was able to successfully parse the file using the Python sas7bdat library. Not entirely sure what you mean by the offset value going beyond the page boundaries for one of the subheaders, as I don't have much experience with the SAS format.

@saurabhvermaabd98
Copy link

in my case it is happening with SASYZCR2 compression type

@PCaff
Copy link
Collaborator

PCaff commented Feb 19, 2020

@saurabhvermaabd98 can you please provide a test file with this issue?

@printsev
Copy link
Contributor

@westonsankey @saurabhvermaabd98 -- could you please try with parso 2.0.12 as your file might have contained deleted rows, and it was improved in 2.0.12. Also if you have the test file, that would be very helpful if you could share it with us. In the meanwhile I will put the "nodataset" label as it's pretty hard to impossible for us to fix the issue without dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants