the problem is i have a very large set of results ...
# general
m
the problem is i have a very large set of results i’m processing and each result may require significant governance units. apparently map/reduce has a hard per result cap on governance and it won’t yield automatically if you exceed it. it’ll just throw an error. it only yields if governance is exceeded for the overall phase
b
what kind of hardcore processing are you doing that requires over 5000 points per result
and how much do you potentially need
s
structured right, you could potentially give each "result" quite a lot of governance units. For example, there is no reason that you are limited to just one Reduce phase per result. You could do some pre-processing of the result in the Map phase, and for each unit of work, create a Reduce context to deal with it. I have done this before, using complex keys (some id or value unique to each result + a sequence letter, for example). You might need conditional logic in the Reduce phase to perform different work depending upon the sequence letter in the key, but it is doable. However, if all of the processing for each result has to be performed sequentially, then it won't help, as you can't guarantee the ordering of the reduce phases.
m
Ok, thanks for the suggestion. Basically, i’m loading an advanced promotion discount search, going through each result, and if the result uses an item saved search for eligible items, i load that search and store the item names from the search into an array. the problem is there are potentially millions of items that could be returned for one item search per result of the promo discount search
b
how many advanced promotion discount are there
if there are less than 5 million or so, you can make the getInputData step divide your results into search pages
your map step can get the results for each page
and your reduce can combine the results into your array
m
ok. maybe i’ll try that. thanks!
Is there a way to push all my map/reduce results into an array so i can export them to a file when they’re done? i tried pushing each processed result from the reduce phase to a global variable and then exporting that variable during the summarize phase, but that doesn’t work. i can’t export the processed result during the reduce phase because then that just gives me a separate file per result
b
global variables wont work (at all honestly)
they don't share the same context
you are supposed to use the summaryContext output for that
my general warning, any of netsuite's iterator functions require you to return true to continue iterating
m
yeah i was having trouble retrieving the output property. i’m new to map/reduce, so i’m sure i’m probably doing something wrong. can i pass stuff from one phase into the summary context and access through the output property?
b
you can only pass the keys and values you write
what does your attempt to use the output look like
m
i think i couldn’t log it at all
i’m going to take another look though
b
keep in mind that a lot of the stuff in ss2 use getter and setter functions
not everything will log
m
i’ve got a more serious problem at the moment. my reduce phase timed out because i guess the result was too big. i thought i might run into this, which is why i was trying to figure out how to yield in a scheduled script using 2.0. during the reduce phase, if my promo object has an item saved search, i’m running the search and storing the results in an array. but some of these searches pull 2-3 million results. once you hit that per result cap with a map/reduce, the script just fails
i don’t think i can spread out the process in the map phase either. i mean the logic is pretty minimal. i’m passing this promo object from the map phase to reduce. the reduce phase just checks if there’s a saved search id. if there is, it runs the search and returns the results. that’s all i’m having it do. thinking i can break that logic up
b
well, there is no yielding in ss2 scheduled scripts
m
*can’t
yeah that’s what i hear
makes it tough though if one result happens to be huge. might have to write this in 1.0
b
i still say do the promo search in getInputData and have it get all the searches you have to run
you can make it so that the object you return has keys of the search to run and values of an array of pages for the search
your map writes the results of each page
and your reduce has 5000 points to combine your search results
m
can getInput handle that much though? the logic from top down is i’m running a promo search that may have around 100 results, push the fields i need into an object, checking if it’s got an item saved search, if it does, push all the items into an array and store in the original object, then at the end push all the results into an array and create a json file
if i search for my promos, and each result of that search may have an item search … that just seems like a lot of searching for the getInput phase
b
100 searches at 5 points each is 500 points
you dont need to fetch the results
sorry
m
right. but what if one promo result has an item search that contains a few million
b
bad math
you dont run the item search
m
no?
use Search.runPaged to get a PagedData object
it costs 5 points
it tells you meta information about the search results
you would be interested in the pageRanges, which tell you how to fetch the results
m
ok. so you’re saying get pagedData objects for all my item searches during the getInput phase, then fetch the results in another phase?
b
correct
m
gotcha. ok, i may try that. thanks!
b
you would need to split the data so that each key processed by a map would represent one page of data
i guess you could do multiple pages per key if you really want
but you would want to fit 1000 points
m
fyi i don’t think getting the paged data for the item searches ahead of time during getInput will work, because when you pass that data into the other phases and it’s stringified, you can reference the object directly to get the actual page ranges, but methods like fetch won’t work anymore to retrieve fields and such
b
plan on getting a new PagedData each time and using the stringified pageRange to tell which index to fetch
should almost work the same unless someone changing search results between getInputData and/or map is a real concern
m
Yeah i don’t think map/reduce is going to work. The reduce phase has a 5k per result limit. Let’s say all i’m doing in that phase is fetching all my paged data. Some of these results may have 2-3 million lines in the search. Take 2.5 million as an example … at 1k lines per page that’s 2,500 pages i’ll have to fetch. A single fetch() is 5 units. That means to fetch 2,500 pages it will cost me 12.5k units, far exceeding the reduce phase limit. I’m thinking i may have to script this in 1.0
b
ive been trying to steer you to fetch individual pages in the map phase instead of getting all pages in the reduce phase specifically to avoid that problem
m
Yeah but the map phase has a significantly lower limit, only 1k
per result
that means i can only do 200 fetches before i hit the limit
i may need 2,500 fetches
b
individual pages
fetch 1 page per map key