Monday, October 7, 2024

Today And Tomorrow Of Data Analytics

Image More details: Found here

Richard Meng, Roe AI CEO. Founder backed by Google AI ventures. Previously did research in UCB and led eng teams at LinkedIn and Snowflake.

The data evolutions in the last 10-plus years in Silicon are fascinating. From the University of California, Berkeley, the top academic place for big data research using supercomputers, to LinkedIn, the home of the greatest batch and nearline data ingestion technologies, and finally to Snowflake, the center of cloud data innovation. This time, however, it was a genuinely Cambrian moment for big, unstructured data.

It's become clearer that unstructured data opportunities will come from two broad personas: the traditional industry and the AI-native companies. The traditional industry deals with many terms and conditions and communication in documents and natural language format. The AI-native companies are those whose core business is around GenAI. Thus, they produce unstructured data at scale and are more aware and urgent about extracting unstructured data insights.

Admittedly, because this market is still in its infancy phase, it's evident that there are challenges and roadblocks to achieving unstructured data intelligence, such as:

1. Stale data stacks. The data stack is locked into structured data, including data governance, data connectors, data orchestrators, data lineage and data warehouses/lakehouses.

2. Mindset change. Data practitioners are not trained to do data modeling on unstructured data, and it's hard to cold start something unless there is a painful enough use case in the enterprise (vs. value-add use case).

History has told us the best way to adopt a concept is to take iterative steps instead of taking one big leap to the destination. Instead of overthinking the ideal architecture or who should be doing what organizationally, focus on pain points and build an MVP to prove the value.

Even though there are many challenges ahead, it's important to note that this unstructured data revolution is the biggest of all. Nothing is better than taking a whirlwind tour of a SQL query that every data person knows: SELECT count(*).

No comments:

Post a Comment