Build, test, and run your Apache Spark ETL and machine learning applications - StreamAnalytix Blog

Build, test, and run your Apache Spark ETL and machine learning applications faster than ever

By Punit Shah | Jun 25, 2019

Start building Apache Spark pipelines within minutes on your desktop with the new StreamAnalytix Lite.

Manually developing and testing code on Spark is complicated and time-consuming, and can significantly delay time to market. A visual low-code solution, on the other hand, can simplify and accelerate Spark development.

StreamAnalytix Lite, a light-weight, self-service data flow, and analytics platform, has transformed the experience of developing and running Spark applications, making it visual, faster and easier – right on your desktop, at no cost.

StreamAnalytix Lite, a developer edition of the StreamAnalytix Enterprise Edition, retains all its capabilities and features to build, test and run enterprise-grade Spark applications 10x faster vs. hand coding. It offers an intuitive drag-and-drop visual interface to instantly transform your journey with Spark on a desktop or a single node.

With its new release, StreamAnalytix Lite has further enhanced Spark development with a richer set of connectors, 150+ built-in Spark operators, interactive development, enhanced self-service features, and higher collaboration for multiple users. It also offers additional features like auto-schema detection, test suite support, user recommendations, and error detection at design time, use of Notebooks, and more.  Further, it enables hand-written custom logic in the language of your choice (Java, Scala, Python).

StreamAnalytix Lite comes with an array of support documentation, resources, and built-in sample pipelines to onboard and expedites your journey with the platform. Some critical features offered by StreamAnalytix Lite are:

  • Build and run enterprise-grade Spark applications on your desktop: End-to-end application life cycle management – build, test, debug, deploy, and manage in a unified platform
  • End-to-end ETL capabilities: Visually perform data cleansing, data blending, and data enrichment to transform batch as well as streaming data
  • Built-in advanced analytics and machine learning capabilities: Use built-in analytical operators like Spark MLlib, Spark ML, PMML, TensorFlow, and H2O
  • Visual and interactive development: Use an intuitive drag-and-drop interface, built-in processors, and a visual pipeline designer
  • Self-service platform to create data flows: Interact with the data as you build your data flows, leverage auto schema detection and data profiling, get auto-generated user recommendations, and more
  • Get all Spark features in one unified development tool: A wide array of built-in Spark operators for data sources, transformations, machine learning, and data sinks. Support for Spark 2.3 and Spark Structured Streaming makes it easier for customers to build production-grade continuous applications allowing users to handle out-of-sync data better, maintain greater consistency within their data streams and more efficiently join streams with static data sources.
  • Use powerful multi-tenancy features: Multiple users can connect to a single instance through a web-based interface

Experience the ease of Spark application development with StreamAnalytix Lite

The screen below shows the pipeline designer and the “Inspect” feature of StreamAnalytix Lite where a developer builds and iteratively validates a Spark pipeline by injecting sample test records and seeing the data changes at each step of the flow.

Who should use StreamAnalytix Lite

Developers, business analysts, data scientists, and DevOps specialists can use StreamAnalytix Lite to build unlimited Spark workflows on their desktop (Windows, Mac, or Linux) or any single node.

Recommended StreamAnalytix Lite usage

StreamAnalytix Lite is learning, experimentation, and development tool to make Spark development easy for a wide range of users. It is not recommended as an execution platform for production applications. Big data processing applications or pipelines built on StreamAnalytix Lite can be seamlessly exported to the production grade (Enterprise) edition of the StreamAnalytix platform to run at full enterprise scale in production on multi-node Spark clusters.

About StreamAnalytix

StreamAnalytix is an enterprise-grade, visual, self-service data flow and analytics platform for unified streaming and batch data processing based on the best-of-breed open source technologies. It supports the end-to-end functionality of data ingestion, enrichment, machine learning, action triggers, and visualization. StreamAnalytix offers an intuitive drag-and-drop visual interface to build and operationalize big data applications five to ten times faster, across industries, data formats, and use cases.

You may also be interested in…



Key considerations for moving ETL workloads and enabling self-service ETL on cloud

The exponential growth of data across industries is fuelling the evolution of extract, transform, and load (ETL) processes.


Modernize your ETL processes with StreamAnalytix

Businesses are struggling with huge volumes of data to solve complex business problems while relying on their legacy data platform…

Case Study

Leading Cable TV and Telecom Provider Enhances Customer Experience with A Customer 360 View, Using StreamAnalytix

Cable TV service providers worldwide are facing immense competition for customer retention and new customer acquisition, not only from traditional…

Case Study

Real-Time Multi-Lingual Classification and Sentiment Analysis of Text Using StreamAnalytix

Challenges A major telecom company providing nationwide telecom services wanted a system that performs real-time, multi-lingual classification and sentiment analysis…

White Paper

Impetus Technologies’ StreamAnalytix Lite narrows the Spark talent gap -Ovum Analyst Research Note

Spark has eclipsed MapReduce as the preferred processing engine in the enterprise due to its speed, real-time data processing capabilities,…


Apache Spark Empowering the Real-time, Data Driven Enterprise: The De Facto Choice for Stream Processing and Machine Learning

Apache Spark is one of the most popular Big Data frameworks today.  It is fast becoming the de facto technology…

Start your free trial

of StreamAnalytix



StreamAnalytix Lite Now

Schedule a Demo