Curious about M/R script overhead. We're doing a s...
# suitescript
f
Curious about M/R script overhead. We're doing a scrubbing project on a system. Using a Map/Reduce script, I was able to send data to a Business Logic module and update ~3600 records an hour (about 1/sec) Using the same Business Logic module and calling from a Restlet, I was able to update 50K records an hour. I've disabled everything on the system. Makes no sense why it would be faster via the restlet. Wondering if anyone else has had a similar experience. The way I counted was the number of records updated.
e
Curious when you say Business Logic module, is that an external system that you're making an API call to?
Also, what is the concurrency limit on the M/R script.
m
On the records you're updating, if you have user event scripts that run in the Map/Reduce context but those same scripts don't run in the Restlet context, then the Restlet method is doing way less work. It's still updating the records, but all of the work for the user event scripts tied to those records may not be firing.
f
@eblackey they're just module scripts to keep the logic isolated from the entry points and data access. Makes for easier testing and better readability in my opinion. We access the business module file by calling it from any entry point (Restlet, Scheduled, Map/Reduce) Concurrency on the M/R is 15, but you can oversubscribe them to 30.
@Mike Robbins I think I've scoured the UE scripts already, but your making me think I may need to take a closer look. Thanks for the nudge.
s
Well, it has been years since I did my own performance test / comparison, but back in 2014 I determine that Restlets scripts did indeed perform faster that just about any other script or integration method at the time (faster that scheduled scripts, SuiteTalk Web Services, CSV imports, etc.) Of course, at that time it was before SuiteScript 2.x came out, so Map/Reduce wasn’t an option and SuiteTalk REST wasn’t available yet either. But over the years we have created many Restlets and the throughput we see with them is impressive, so this doesn’t surprise me. However we also use Map/Reduce scripts a lot, and they can perform very well, but we have noticed that the getInputData stage can actually end up being the bottleneck for Map/Reduce script when the data to retrieve is very large, like over 30,000 records. For some reason, the time it takes to get all of the data and send it to the next phase does not scale well above a certain point, and you may find that 45,000 records could take more than twice the time of 30,000 records in the getInputData phase (that’s just an example, it is going to depend on a lot of factors specific to your account and your data). But I will say if you are dealing with tens of thousand of records or more, it is worth experimenting with limiting the GID phase to a certain amount, and see where the sweet spot is for that script. I have to limit most of mine to the 30-40 thousand range, as that seem to be where we get the best throughput, before it degrades. As mentioned by others, workflows, user events, and even client scripts (yes client scripts can run server-side!) can all fire for certain contexts that might only affect one script type but not another, so that’s worth a look too, to make sure you are doing a real comparison of the scripts, and not other customizations being triggered by them.
f
From what I am seeing the M/R is bottlenecking on the overhead of managing a large dataset. (> 1m records). Running and managing this scheduling and coordination outside of Netsuite works well.
s
Exactly, this tracks with my experience as well. While the latter stages perform very well with any amount of data, the first stage need to be tuned/optimized. One thing to do is trim out any columns or fields form your search or query that aren’t used. Alternatively, running multiple executions of it with fewer results is what works for us. We use a control script to manage that in some cases.
1