-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate support for large(r) files #15
Labels
help wanted
Extra attention is needed
Comments
First test with the results of the 2018 StackOverflow survey (~186MB, 98856 rows) took 383.422 seconds const Papa = require('papaparse')
const fs = require('fs')
const file = fs.createReadStream('./huge.csv')
const start = Date.now()
Papa.parse(file, {
header: true,
skipEmptyLines: true,
step: function (row) {
console.log("Row:", row.data)
},
complete: function () {
const duration = (Date.now() - start) / 1000
console.log('Reading the file took ' + duration + ' seconds')
}
}) |
Using chunk instead of step took 360.858 seconds. const Papa = require('papaparse')
const streamFile = require('./streamFile')
const start = Date.now()
const file = streamFile('./huge.csv')
Papa.parse(file, {
header: true,
skipEmptyLines: true,
/* step: function (row) {
console.log("Row:", row.data)
}, */
chunk: function (chunk) {
console.log(chunk)
},
complete: function () {
const duration = (Date.now() - start) / 1000
console.log('Reading the file took ' + duration + ' seconds')
}
}) |
Ah, I just created #23. Maybe remove that one and copy over the ideas here? |
Maybe we can try, say, 200 lines at a time . . . row by row is still going to kill, espcially on non-SSd drisks. 😓 |
This was referenced Dec 11, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Streams should be helpful to handle larger files.
In Papa Parse, the csv parser used in this project, there is already support for streams built in.
Files could be handled row by row instead of "all or nothing".
The text was updated successfully, but these errors were encountered: