Apache Spark: The New Enterprise Backbone for ETL, Batch and Real-time Streaming

In spite of investments in big data lakes, there is wide use of expensive proprietary products for data ingestion, integration, and transformation (ETL) while bringing and processing data on the lake.

Enterprises have successfully tested Apache Spark for its versatility and strengths as a distributed computing framework that can completely handle all needs for data processing, analytics, and machine learning workloads.

Since the Hadoop distributions and the public cloud already include Apache Spark, there is nothing new to be procured. However, the skills required to put Spark to good use are typically unavailable today.

In this webinar, we will discuss how Apache Spark can be an inexpensive enterprise backbone for all types of data processing workloads. We will also demo how a visual framework on top of Apache Spark makes it much more viable.

The following scenarios will be covered:


  • Data quality and ETL with Apache Spark using pre-built operators
  • Advanced monitoring of Spark pipelines

On Cloud

  • Visual interactive development of Apache Spark Structured Streaming pipelines
  • IoT use-case with event-time, late-arrival and watermarks
  • Python based predictive analytics running on Spark

Streaming Analytics for IoT with Apache Spark

Modern IoT operations can drive digital transformation by analyzing the unprecedented amounts of data generated from devices and sensors in real-time.

Apache Spark is a widely used stream processing engine for real-time IoT applications. Spark streaming offers a rich set of APIs in the areas of ingestion, cloud integration, multi-source joins, blending streams with static data, time-window aggregations, transformations, data cleansing, and strong support for machine learning and predictive analytics.

Join Anand Venugopal, AVP & Business Head, StreamAnalytix and Sameer Bhide, Senior Solutions Architect, StreamAnalytix to learn about the rapid development and operationalization of real-time IoT applications covering an end-to-end flow of ingest, insight, action, and feedback.

The webinar will cover the following:

  • Generic IoT application blueprint
  • Case studies on IoT applications built on Apache Spark – connected car and industrial IoT
  • Demonstration of an easy, visual approach to building IoT Spark apps

Anomaly Detection: Real World Scenarios, Approaches, and Live Implementation

Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains, such as fraud detection, network traffic management, predictive healthcare, energy monitoring and many more.

However, detecting anomalies accurately can be difficult. What qualifies as an anomaly is continuously changing and anomalous patterns are unexpected. An effective anomaly detection system needs to continuously self-learn without relying on pre-programmed thresholds.

Join our speakers Ravishankar Rao Vallabhajosyula, Senior Data Scientist, Impetus Technologies and Saurabh Dutta, Technical Product Manager – StreamAnalytix, in a discussion on:

  • Importance of anomaly detection in enterprise data, types of anomalies, and challenges
  • Prominent real-time application areas
  • Approaches, techniques and algorithms for anomaly detection
  • Sample use-case implementation on the StreamAnalytix platform

Apache Spark Empowering the Real-time, Data Driven Enterprise: The De Facto Choice for Stream Processing and Machine Learning

Apache Spark is one of the most popular Big Data frameworks today.  It is fast becoming the de facto technology choice for stream processing, real-time analytics, data science and machine learning applications at scale. It has moved well beyond the early-adopter phase, is supported by a vibrant open source community and is enjoying accelerated adoption in enterprises.

Join our guest speaker from Forrester Research, VP & Principal Analyst, Mike Gualtieri and StreamAnalytix, Product Head, Anand Venugopal for a discussion on the trends and directions defining the growing importance of Apache Spark for stream processing, machine learning and other advanced data analytics applications.

The webinar will cover the following topics:

  • What is driving Spark adoption?  What are the influencers, trends, compelling capabilities and use cases?
  • What are some of the challenges or inhibitors?
  • Impetus to introduce Visual Spark Studio – a free, newly downloadable IDE that offers break-through productivity to learn, develop and deploy Spark based real-time and advanced analytical applications.
  • Impetus customer success stories around real-time solutions with Spark/StreamAnalytix.

The Structured Streaming Upgrade to Apache Spark and How Enterprises Can Benefit

The adoption of Apache Spark to analyze data in real-time is increasing with its ability to handle sophisticated analytical requirements and a common framework for streaming and batch. However, most organizations are also looking for “true streaming” features like lower latency and the ability to process out-of-order data.

Structured Streaming, a new high-level API, introduced in Apache Spark 2.0 promises these and other enhancements to the Spark approach to streaming data processing.

In this webinar, Anand Venugopal (Product Head) and other technical experts from StreamAnalytix, will be speaking about the promising developments in Apache Spark 2.0 and how organizations can leverage structured streaming to make timely and accurate decisions and stay competitive.

In this webinar you will learn:

  • Evolution of Spark and its functionality to date including version 2.2
  • Structured Streaming – Technical overview, benefits and limitations
  • How to integrate Structured Streaming with the surrounding stack
  • Talent Vs Tooling

Real-time Data360 on Apache Spark

‘Data360’ is a new term and is being used to represent a one-stop shop for all your Big data processing needs.

Enterprise IT teams are faced with the challenge of choosing one vendor for data ingest, another for data wrangling, a third one for machine learning/analytics and yet another for visualization.

Shouldn’t it be really easy to do all this in a unified way, especially if you have already chosen to go with Spark as your Big Data platform? It can be; however the powerful usage of Spark still needs very skilled Scala/Java programmers. A different approach is needed.

During this webinar you will get to know about:

  • A powerful all-in-one Apache Spark strategy for the enterprise and an implementation approach for end-to-end big data analytics processing
  • The elements of a real-time Data360 solution – Ingest, Cleanse, Transform, Blend, Analyze, Load and Visualize
  • A combination of tools and tactics used for Data360 on streaming and historical data, using Apache Spark and Apache Spark Streaming
  • How use cases like anomaly detection, customer 360, IoT and log analytics, fraud and security analytics and many more can be achieved using this approach

Self Service Pre-built Pipelines for Building Real-time Streaming Apps

Streaming analytics is fast becoming a must-have technology for enterprises seeking to gain competitive advantage from Data.

There is a growing demand for these new real-time applications and use-cases to be created and deployed quickly. In order to be efficient, enterprises need to take creative and collaborative approaches which maximize re-use.

Join this webinar by the Impetus team of experts who are helping Fortune 1000 Enterprises implement real-time streaming analytics.

During this webinar session, you will get:

  • An overview of re-usable patterns in streaming analytics applications development
  • Introduction to an enterprise level streaming analytics strategy for 2017 and beyond
  • A visual IDE based approach to build, maintain and operate stream-processing applications
  • How to leverage multiple technologies like Apache Storm and Spark Streaming within a single real-time pipeline
  • A demo of A-B testing of predictive models and performing run-time model upgrades with no down-time

Partner Webinar – With Fannie Mae, Hortonworks and Impetus: The Business Impact of Fast Data Analytics

The promise of big data is greater than ever befor due to an explosion in the number and variety of data sources. This has caused a shift from traditional structured, and batch or periodic data warehouse environments to today’s more complex combination of structured with semi & unstructured data, along with the requirement to apply analytics in real-time. To determine how best to deliver on the full promise of this opportunity, enterprises today have to sort through an often confusing array of commercial and open source solutions.

This webinar will feature a real-world example describing how Fannie Mae worked with partners Hortonworks and Impetus Technologies to develop a streamlined solution to:

  • Reduce the cost and complexity of their data infrastructure by leveraging more efficient and effective big data and fast data ingestion platforms
  • Deliver net new analytics capabilities
  • Apply data quality checks and data enrichment at the point of ingest through the use of real-time analytics

Spark Streaming Made Easy!

Real-time streaming analytics and IoT seem to be the next big thing in the data and analytics industry. As enterprises adopt Apache Spark and Spark Streaming widely, IT teams are facing the challenge to provide the tools and the framework needed to make Apache Spark Streaming an easy-to-use, robust, scalable and multi-tenant service.

Join this webinar from the StreamAnalytix team at Impetus Technologies to see how this problem is being solved at many Fortune 1000 companies.

This webinar will cover:

  • An overview of the stream processing landscape
  • The need for a “Streaming platform” integrated with the Hadoop data lake
  • A visual IDE approach for building applications on Spark Streaming
  • The usage of various Spark Streaming operators in sample applications
    • Spark SQL, Window, ML Lib, Join, Custom-Scala-code etc.
  • Real-time Dashboards, App Deployment & Monitoring

Harnessing the Firehose: Getting Business Value from Streaming Analytics

The Briefing Room with Dr. Robin Bloor, Dez Blanchfield and StreamAnalytix- Impetus Technologies

Do you feel the pace of business quickening? That’s the reality in the analytics world, as companies continue to shorten the time window between data acquisition and analytical business value. A range of market factors are forcing the hands of decision-makers across several industries: retailers trying to satisfy customer needs in Web-time; banks and insurance companies working to prevent fraud; manufacturers aiming to avoid machine downtime; telcos trying to minimize churn. The business needs are all over the map.

Register for this episode of Hot Technologies to hear veteran Analysts Dr. Robin Bloor and Dez Blanchfield, as they examine the fundamentals of streaming analytics, and how today’s leading-edge technologies can solve business problems at network speed. They’ll be briefed by Anand Venugopal of Impetus Technologies, who will showcase his company’s platform, StreamAnalytix, which is designed to accommodate the variety of streaming analytics engines available today. He’ll explain how their mission to accelerate and ‘future-proof’ streaming analytical functionality is generating results.