Designing supervised ML workflows to make a Data catalog AI-powered

Saves money on buying a different AI product, also 2/3 of all our customers purchased this software

A classification algorithm web app for Data Governance context

Why this product exists

Utilizing our customer data sets, we can build a Catalog which goes beyond just being a directory, and becomes a learning system

Most modern enterprise organizations use Microservices, leading to fast, scalable products & efficient engineering. This distributed model also means each team manages its own data, giving rise for the need of a Data catalog to consolidate the systems & metadata.

Challenges with discoverability, data silos across different sources, and dependency on multiple teams to access data are resulting problems which traditional catalogs have made attempts to solve.

Now, using customer-provided metadata/labels like PII type, the classification algorithm can be trained to reason, be better, and offer insights with high confidence.

Tradeoff: Choosing speed over polish, reduced rework later

Data analysts and Data engineers are the product's high level audience

descriptive → searchable → browsable

labelable → trainable → improvable

Old catalogs vs Transcend

How might we through labeled metadata & human-in-the-loop workflows build a Catalog where success is classification precision & recall, not just discoverability?

Why it matters

Data engineers & analysts confirm that Data catalogs fail at accuracy without supervision

Business value such as identifying users that about to churn, and swiftly taking actions to retain them can be derived, thus saving money on customer acquisition costs.

Onboarding