Extension for TypedPipe to add a cumulativeSum method.
Given a TypedPipe with T = (GroupField, (SortField, SummableField))
cumulaitiveSum will return a SortedGrouped with the SummableField accumulated
according to the sort field.
eg:
('San Francisco', (100, 100)),
('San Francisco', (101, 50)),
('San Francisco', (200, 200)),
('Vancouver', (100, 50)),
('Vancouver', (101, 300)),
('Vancouver', (200, 100))
becomes
('San Francisco', (100, 100)),
('San Francisco', (101, 150)),
('San Francisco', (200, 300)),
('Vancouver', (100, 50)),
('Vancouver', (101, 350)),
('Vancouver', (200, 450))
If you provide cumulativeSum a partition function you get the same result
but you allow for more than one reducer per group. This is useful for
when you have a single group that has a very large number of entries.
For example in the previous example if you gave a partition function of the
form { _ / 100 } then you would never have any one reducer deal with more
than 2 entries.
Extension for TypedPipe to add a cumulativeSum method. Given a TypedPipe with T = (GroupField, (SortField, SummableField)) cumulaitiveSum will return a SortedGrouped with the SummableField accumulated according to the sort field. eg: ('San Francisco', (100, 100)), ('San Francisco', (101, 50)), ('San Francisco', (200, 200)), ('Vancouver', (100, 50)), ('Vancouver', (101, 300)), ('Vancouver', (200, 100)) becomes ('San Francisco', (100, 100)), ('San Francisco', (101, 150)), ('San Francisco', (200, 300)), ('Vancouver', (100, 50)), ('Vancouver', (101, 350)), ('Vancouver', (200, 450))
If you provide cumulativeSum a partition function you get the same result but you allow for more than one reducer per group. This is useful for when you have a single group that has a very large number of entries. For example in the previous example if you gave a partition function of the form { _ / 100 } then you would never have any one reducer deal with more than 2 entries.