The Data Lineage feature is capable of capturing the life cycle of data that goes through pipeline components. It enables tracking progress of data at every stage when it flows in the pipeline.
The Data lineage button is available on the pipeline tile. It is enabled for both Spark and Storm pipelines.
The Data Lineage functionality works in the similar manner for Storm and Spark.
The Configure option of Data Lineage enables you to perform following operations:
- Enable or Disable Data Lineage feature.
- Configure Audit ID generation rules.
- Determine Kafka sink to be used for Data Lineage.
Audit ID: You can provide Audit ID to the message flowing in the pipeline. For example, if thousands of messages are flowing in the pipeline and you want to search one or two messages. It will be really a tedious job. Audit Id option is helpful and helps in searching messages in few seconds.
Sink: After auditing is enabled and a unique id is provided to the messages, data will flow further in the pipeline and will be finally dumped to Kafka.
This option enables you to view or search the Lineage data from its source to its destination.
There are three options for searching the Lineage data:
- Exact Match
- Prefix Search
- Time Range Search
Exact Match: An exact match search criterion will search for the exact matching phrase. For example, if you enter John in the Row Id Field. Search result will display all the messages which contain John.
Prefix Match: It will match the prefix that you entered in the search criterion. For example, enter the Prefix as J in the row search criteria. It will display all the records in which the message name starts with J.
Time Range Search: You can specify the date and time duration for which you wish to view the records. For example, if you enter the date 12.12.2016 and specify the time duration from 13.44:09 to 13.50.09.The system will fetch all the records of the specified time duration.