Batch Processing Vs Stream Processing : Beginners Guide

Sayan Chowdhury
2 min readJun 2, 2022

--

Copyright: Sayan Chowdhury

In the big data world Batch processing and stream processing are fundamental principles.

Batch Processing:

Batch processing is typically used when dealing with extremely large amounts of data and/or when data sources are legacy systems incapable of delivering data in streams.

Data produced on mainframes is an example of data that is typically processed in batch mode. Accessing and integrating mainframe data into modern analytics environments takes a bit of time, making streaming data unfeasible in most cases.

The processing of bills is done in batches. Bills are handled in batches.

Batch processing is useful when you don’t need real-time analytics results and when it’s more important to process large amounts of data than it is to get quick analytics results (though data streams can also involve “big” data).

Stream Processing:

If you want analytics data in real time, stream processing is essential. Using platforms such as Spark Streaming, you can transmit data into analytics tools as soon as it is produced and get near-instant analytics results.

Stream processing is beneficial for tasks such as fraud detection. Stream-processing transaction data allows you to detect discrepancies that indicate fraud in real time and stop suspicious transactions before they can be completed.

Core Difference

In Batch Processing, data is processed over most or all of the data, on the other hand in Stream Processing, data is processed over the most recent record. Thus, Batch Processing deals with large batches of data, whereas Stream Processing deals with individual stats or micro batches of a few records.

In terms of results, batch processing will have a delay of minutes to hours, whereas stream processing will have a lag of seconds or milliseconds.

Liked the content? Let me know your feedback ✍️

Follow e for more 💡

Originally published at https://www.linkedin.com.

--

--

Sayan Chowdhury
Sayan Chowdhury

Written by Sayan Chowdhury

Design Engineer at Larsen and Toubro • Data Enthusiast • Google Cloud Platform

No responses yet