A very large part of today’s data processing is done on data that is continuously produced, e.g., data from user activity logs, web logs, machines, sensors, and database transactions. Until now, data streaming technology was lacking in several areas, such as performance, correctness, and operability, forcing users to roll their own applications to ingest and analyze these continuous data streams, or (ab)use batch processing tools to simulate continuous ingestion and analysis pipelines.
This is no longer the case, as streaming technology has now matured and is being rapidly adopted by the enterprise. Writing data applications on top of streaming data has several benefits:
•It decreases the overall latency, as data does not have to be necessarily moved to a filesystem or a database before becoming valuable for analytics.
•It simplifies the architecture of data infrastructure, as there are far fewer moving parts to be maintained and coordinated to ensure timely answers from data signals.
•It makes the time dimension explicit, associating each data record with its timestamp, and allowing analytics to process and group events based on these timestamps