Does anyone know how the `values` array in the Re...
# general
s
Does anyone know how the
values
array in the Reduce stage of a map/reduce script built? It does not follow the order written by the Map stage? In my case the order seems to be driven by something else?
s
When your map stage calls context.write, you pass it two argument: a key and a value. The Reduce context will combine all values with the same key together. The order of the keys shouldn’t matter (if it does, then Map/Reduce is not the right script type to use). You will get one key and an array of values associated with that key. It may only be an array of one value. For example, if map makes the following context writes (usually across several instances):
Copy code
context.write('fruits', 'apple');
context.write('vegetables', 'carrot');
context.write('fruits', 'banana');
context.write('nuts', 'walnut');
context.write('fruits', 'orange');
context.write('vegetables', 'turnip');
Then your reduce phase will only have three keys:
'fruits'
,
'vegetables'
, and
'nuts'
and will only run three times. When reduce handles the
'fruits'
key, the values will be:
['apple', 'banana', 'orange']
, though it will be a JSON string that has to be parsed first. Again, the order of the keys and values shouldn’t really matter. If order does matter, you’ll either need to sort the values appropriately, or use a scheduled script that can process everything in a particular order.
since you can’t control the order of map writes, you won’t be able to predict the order of the values in reduce
s
I see, because i had a bunch of transactions with different dates. My map stage grouped the transactions bank accounts as the key, and values being the transaction data. In the reduce stage i wanted to have the transaction data in Date ASC order but it was all over the place. Ended having to sort in the reduce stage instead
thanks @scottvonduhn, what you said is pretty much inline with what i experienced