Despite investments in big data lakes, there is widespread use of expensive proprietary products for data ingestion, integration, and transformation (ETL) while bringing and processing data on the lake.
However, enterprises have successfully tested Apache Spark for its versatility and strengths as a distributed computing framework that can handle end-to-end needs for data processing, analytics, and machine learning workloads.
In this webinar, we will discuss why Apache Spark is a one stop shop for all data processing needs. We will also demo how a visual framework on top of Apache Spark makes it much more viable.
The following scenarios will be covered:
- Data quality and ETL with Apache Spark using pre-built operators
- Advanced monitoring of Spark pipelines
- Visual interactive development of Apache Spark Structured Streaming pipelines
- IoT use case with event-time, late-arrival and watermarks
- Python based predictive analytics running on Spark
Anand is a techno-business leader at Impetus Technologies providing product strategy, product marketing and sales leadership for the StreamAnalytix business at Impetus. He is focused on evangelizing and delivering real business value from big data and fast data analytics to Fortune 1000 enterprises. Having spoken at numerous big data conferences on a range of topics including big data use cases, ROI, real-time streaming analytics, enterprise big data bus – Anand is a well-known thought leader in the big data ecosystem. He brings 22+ years of software technology, architecture and go-to-market experience in hi-tech, telecom, mobile, gaming and enterprise big data and analytics systems to enrich his speaking engagements.
Ensure successful data ingestion on the cloud: Strategies for 2021
Mar 19, 2021 | 11:00 am PT / 2:00 pm ET