Abstract:
Methods, systems and apparatus are described herein that include processing a data stream as a sequence of batch jobs during collection of data in the data stream. Processing of successive batch jobs in the sequence includes creating a particular batch job upon completion of processing of a preceding batch job in the sequence. The particular batch job has a batch size that depends upon an amount of data in the data stream that has been collected since creation of the preceding batch job in the sequence, such that the batch size of the particular batch job self-adjusts to data rate changes in the data stream. The particular batch job is then processed to produce resulting data, where processing efficiency and processing time for the particular batch increase with the batch size.