Tuesday, April 14, 2015

What is Metadata?

Metadata is data about data. It is descriptive information about a particular data set, object, or resource, including how it is formatted, and when and by whom it was collected.

In terms of Taled open Studio, metadata refers to reusable configurations that describe the data, its attributes, or its containers. We could define metadata in the Studio that describes an XML schema, a web service definition, or an FTP connection.

Once  the metadata is defined, it can be used across multiple jobs. It also allows a single place to update metadata configurations for many jobs. For example, if the password to an FTP account changes and this FTP connection is used in 10 different jobs, the details would have to be updated 10 times. However, if you store this configuration in a single metadata component, it only needs to be updated once.So the metadata is used for re-usability.


There different type of metadata in talend - 

Schema Metadata

This allows you to define the databases connections and then import objects from databases like tables and view. 
For successful development of jobs, it is important to create the metadata for your data source . Failure to correctly define the data will result in numerous errors and waste of time tracking down problems with data formats that could otherwise be avoided.
Talend provides a host of wizards for capturing metadata from a variety of data sources such as database tables, delimited files, and Excel worksheets and stores them within its built-in metadata repository.
Repository schema
Although you can manually define schemas for each of your components, a much easier approach is to define schemas is through Repository Metadata. This is especially important if you intend to use the same schema definition more than once in your Job, or use it in multiple Jobs.
Many of these repository definitions can use wizards that can not only read a source definition to define your schema; but can also sample data to determine attributes that cannot be obtained from the source definition alone. 
A good example of this is Repository->Metadata->File delimited.
You can also define Generic schemas, where a specific schema type is not available.
The benefits of using Repository schemas are:
1. They can be re-used across multiple jobs, thus reducing the amount of re-keying.
2 Talend will ensure that changes made to a Repository schema are cascaded to all jobs that use the schema, thus avoiding the need to scan jobs manually for Built-In schemas that need to be changed.
3. Impact analysis reports can be generated showing where a Repository schema is being used within a project. This enables the impact of changes to be assessed more accurately when planning changes to any underlying data sources.
Generic schemas
Schema can be easily changed into a generic schema to re-use it.
Generic schema aren’t tied with any particular source, so they can be used as a shared resource across multiple types of data source or they can be used to define data sources that are generated, such as the output from custom SQL queries.

No comments:

Post a Comment