i use file.lines.iterator() to iterate through all...
# suitescript
k
i use file.lines.iterator() to iterate through all the lines of a csv file with a size of 10 MB (~50,000) lines in my map/reduce script, but it is significantly slow.. ie taking more than 20 minutes at least, is it expected? or does that sound like specific to Netsuite account?
b
how long would you expect it to take?
How many lines would you expect to process per second?
k
i don't have much experience with netsuite, but 1 ms per line sounds fair
b
i would only expect millisecond level performance if your script doesnt use any suitescript apis
except maybe a log
🙏 1
s
Are you seeing slowness in processing each line, or is it a long delay between
getInputData
and
map
?
k
the job was created in ~2:45 PM but it is still currently running, i haven't write log from within the lines.iterator().each function to visualize the elapsed time i can try
s
Yes, it appears that getInputData() is serial - as in it gathers up all your results before it sends anything to
map()
That is, map() runs in parallel but not until getInputData() has finished entirely. Would love to hear advice to the contrary but that's been my empirical observcations.
that can be a pain, and indeed I've found sometimes a scheduled script outperform a MR because it can start making progress immediately.
k
does that mean due to the large amount of data returning from
getInputData
stage, there is a long delay between
getInputData
and
map
?
s
Yes - I had a similar problem where I had a large search - the search itself would eventually time out when run in
getInputData
in a MR script so it never even reached
map
.
I expected MR scripts to feed
map()
data in parallel and incrementally while your data was being returned from
getInputData()
but it doesn't seem to operate in that sort of 'streaming' fashion.
In my case, it seemed clear that even though my
getInputData
returned a search reference, behind the scenes it must have been executing the search and trying to load ALL the results from the entire search. Perhaps the same is happening with ALL the lines from the file you're iterating?
k
i simply -- 1. have a global variable as an array to accumulate the records 2. load the file and iterate each line to push records after converting into JSON into the array 3. return the array at the end
i'm surprised map/reduce doesn't work well for that size, since it's only 50,000 lines, what i have found is people are satisfied with the results when they process > 100,000 records within 1 map/reduce instance
b
unfavorable memory wise
you max out at 50 MB of memory
and your processing is probably not helping
k
hmm the file size itself is only 10 MB, even though it has 50,000 lines
b
return the file object in getInputData and do your processing in map
you are basically generating a large object in memory and the garbage collector is probably panicking
k
ah i see, interesting, let me try it out
🙏