Streaming ETL with Impetus StreamAnalytix

Streaming Big Data ETL with Impetus StreamAnalytix and Syncsort DMX – Guest Blog

By Gregory Grubbs | Apr 01, 2015

Streaming Big Data ETL with Impetus StreamAnalytix and Syncsort

Today we are announcing a partnership between Syncsort and Impetus Technologies, and our entry into an integration of batch processing and real-time stream processing that we call “Streaming ETL”. The mix of batch and real-time processing has also been referred to as the Lambda Architecture. Streaming ETL allows a mixing of the best batch and streaming technologies under the umbrella of tools which abstract the complexity of the underlying platforms.

The huge increase in types and sources of data has placed pressure on companies to blend and summarize that data quickly to create actionable information. A combination of real time and batch processing is needed to meet the new demands.

There’s a grab bag of technologies that excel in specific aspects: Hadoop Mapreduce, Storm and Spark for massively parallel processing; Kafka and Spark Streaming along with traditional messaging and queuing software for real time data movement; Mesos and YARN for cluster management. These components can be mixed and matched, but there are many APIs to learn and different skill sets needed to leverage them well.

StreamAnalytix definition with Syncsort DMX bolt

DMX task that performs lookup for IP

Syncsort and Impetus Technologies abstract away the complexity

Impetus Technologies developed StreamAnalytix to ease the process of building, deploying and monitoring real-time Big Data applications. StreamAnalytix provides an abstraction over the complex technologies used in Big Data platforms (like Storm and Kafka).

Syncsort DMX-h gives users an easy way to specify data transformations and data enrichment, and provides an extremely efficient, high performance run time engine. We have provided this engine to be used in StreamAnalytix streaming pipelines. The DMX engine is called whenever the user needs to transform or enrich the data.

Common use cases for Streaming ETL

  • A clickstream analyst requires an IP address in a web log to be converted to a decimal or hexadecimal value for downstream processing
  • A mobile app sends a product identifier used on a web retail site, and that identifier has to be transformed to the product ID used in the Enterprise Data Warehouse using a lookup
  • A hardware manufacturer sends real time testing data from many sites, which needs to be correlated with historical data

On the Syncsort side, we made only minor adjustments to enable the DMX run time engine to allocate less memory than it does for large batch operations, and to expose a Java interface to integrate with StreamAnalytix. The Impetus engineering team quickly added a component in their excellent Web UI to allow users to add one or more DMX transformations in any stream. The teams worked together to publish and interpret metadata from the two products.  In the coming months, we will be working to add features and enhance the integration.

I will be producing a screencast of this new Streaming ETL software, and both web sites will have more information about this exciting joint solution.   We look forward to hearing from you with any questions and feedback.

You may also be interested in…



Key considerations for moving ETL workloads and enabling self-service ETL on cloud

The exponential growth of data across industries is fuelling the evolution of extract, transform, and load (ETL) processes.


Modernize your ETL processes with StreamAnalytix

Businesses are struggling with huge volumes of data to solve complex business problems while relying on their legacy data platform…

Case Study

Real-Time Multi-Lingual Classification and Sentiment Analysis of Text Using StreamAnalytix

Challenges A major telecom company providing nationwide telecom services wanted a system that performs real-time, multi-lingual classification and sentiment analysis…

Case Study

Real-time insider threat detection using machine learning for a Fortune 500 bank

Learn how a large US-based bank used predictive analytics and machine learning to identify and prevent insider information security threats…

White Paper

Top 6 Considerations for a Streaming Analytics Platform

All data is first generated in real-time, be it financial transactions, sensor data, web clickstreams, geolocation data, weather reports, market…


Automated migration from on-premise Hadoop to Databricks and Delta Lake using StreamAnalytix

Most enterprises are undertaking a digital transformation initiative. Data and analytics modernization is an integral part of this journey. On-premise…

Start your free trial

of StreamAnalytix



StreamAnalytix Lite Now

Schedule a Demo