Upgrading your data catalog experience: Field Statistics and Schema
Check out new functionalities within Asset Details to add tags and classifications to fields. View statistics on your fields via the Field Statistics tab.
Imagine that you have an analyst, Adam who is working on a report for your e-commerce company and he needs to understand the data in a table that contains information about customer orders. He notices a column named "order_status" that has values like "pending", "shipped", "delivered", and "cancelled". He is not sure what these values mean and how they are updated. Usually, he’ll send a Slack message to Howard, the data engineer who created the table, for some clarification.
Howard is too busy to handle many ad-hoc requests from various (occasionally, upset) stakeholders at once, so it might take some time for Howard to get back to Adam on his query. However, if Howard had a way to document and share information, then Adam would not need to reach out to Howard repetitively to gain information about the asset.
What Howard needs is a data catalog where he and the data team can tag, add descriptions to data assets, and add additional classifications if there are any sensitive data to be handled.
Introducing two new features within Asset Details: Schema and Field Stats!
Field cataloging with Schema for Tables
You can now add custom tags to your fields to categorize them according to your business needs. For example, you can segment your assets according to your business domains, or your geographical location. Adam may want to tag certain columns with Marketing Analysis, for example.
We’ve also introduced the concept of Classifications to fields where you can now manage your data effectively and ensure security and compliance with regulatory requirements by classifying your fields. When you connect a data source, we will also help you automatically classify your PII and Sensitive fields such as customer name, bank information etc. You can also add or modify these classifications manually on each field.
We’ve also added the ability to add a description for each field. To help Adam to understand the meaning of “order_status”, Howard may have added “...the current state of an order in the system and updated by different events that occur during the order lifecycle.”
Field statistics at your fingertips
We’ve also introduced Field Statistics within your Asset Details directly so that you can quickly understand the distribution, data quality and data types available one quick glance. One of these statistics available is the null percentage for each field that we’ve detected in the table, so that you get a picture of how complete or incomplete the data is before proceeding to use it for data analysis or modelling.
This view is also integrated with Data Quality, so that you can see how many tests were run against the field and if any incidents were raised during these tests. Use this view to quickly understand the health of your data, and proactively add monitoring and notifications to those critical columns.
What’s next for the Catalog?
The most exciting feature of the Catalog that’s coming soon will be the Business Glossary, which will be integrated into our Catalog module. Your business teams would then be able to ditch those spreadsheets and create a dictionary for your most important metrics and link them to your data assets.
Looking for more information about what else we’re working on? Visit our Public roadmap.
Here’s a direct link to our Changelog where we share our newest updates to the platform.