Hi Team, I have a question. There is a limitation ...
# suitescript
c
Hi Team, I have a question. There is a limitation in the MapReduce script for each stage regarding the hard limit of persistent data. For example, the getInputData stage has 1 billion instructions. What exactly are these instructions referring to?
b
not that you can measure it, but it probably refers to Machine code
if you want to be extra specific, it very likely means Java bytecode
either way, you dont have access to the number of instructions your code is consuming
c
Thanks @battk . How to make sure if we are not crossing this hard limit because if it did then it throws an error saying that exceeded limit in the summarise stage by failing entire script silently.
b
you cant
you can only design your code so that you dont have long running loops
c
Okay. To give a background. I have a query which returns the custom records having a path to the AWS S3 bucket path. I’m looping through this custom record in the getinputdata and retrieving the csv file string from the S3bucket and converting to JSON and passing it to the reduce stage one object at a time and create a custom record in the reduce stage. So far there are no issues so I’m just wondering about the limitations.
b
as a proxy, you can ask youself if your code will be running a million statements in any of the stages
if its not, its unlikely to consume a billion statements
c
Agreed with your statements 👍🏻
e
If you're running into this limit during
getInputData
, then that stage is doing way too much work. Based on your description, it sounds like potentially the
getInputData
should just be retrieving the list of Custom Records to process, the
map
stage should retrieve and convert the CSV file, then the
reduce
stage should process the JSON. That would split up the workload.