How Can AI Help Build Data Products?
Insights From Aleksandra Ershova,
Senior Product Manager at One Data
I began my career as a business analyst, where I supported various customers in improving their processes and answering critical business questions. Each project presented a unique and exciting challenge, highlighting the diverse range of use cases out there. Despite having different objectives, all these projects shared a common goal: deriving actionable insights from data. One of the primary challenges my team faced was how to accelerate the entire process from raw data to valuable insights.
Five years ago, I transitioned into a Product Manager role at One Data, focusing on building data products. This shift allowed me to revisit the challenges I had encountered in the past from a new perspective, with an emphasis on making data product development more efficient and less time-consuming.
Today, with the help of AI, we can approach daily tasks of data teams in innovative and transformative ways. According to Gartner, more than 80% of enterprises will have used generative AI APIs or deployed generative AI-enabled applications by 2026. In this article, I will explore several use cases of how AI can streamline the process of building data products.
Requirements specification
When you identify the need to build a data product, the first step is to specify the requirements. Instead of starting from scratch, you can leverage generative AI to get initial suggestions for requirements such as the data product schema, description, or a draft data contract. Although these AI-generated requirements will still require review and enhancement from your side, they offer an excellent starting point and drastically reduce the manual effort involved.
Large language models specialized in text generation are particularly well-suited for this use case. After providing a prompt with the name and basic details of the desired data product, the model can generate preliminary requirement specifications based on this information. This approach not only accelerates the initial phase of the data product development but also ensures consistency and coherence in the requirements gathering process.
Data recommendation
Before initiating the development process, you need to find appropriate input data. Based on the description and schema of the data product you need and metadata of the assets available in your data landscape you could look for assets suitable for the implementation.
In this scenario, a similarity search algorithm and an embedding model could be applied by leveraging information about data table schemas, names, and descriptions to generate recommendations. Simply explained, similarity search is a process of converting your data into embeddings (vectors) and calculating distances between them. Closer distances mean greater similarities, which indicate a higher likelihood that the recommended data aligns with what you require for your data product.
After recommendations are generated, you can review the suggestions to ensure they fit your purposes well and proceed with data product development.
Data generation
Sometimes it can be challenging for data teams to access real-world data due to issues such as data scarcity, privacy concerns, and regulatory restrictions. AI-driven data generation can help you bridge this gap by creating high-quality, privacy-preserving data assets. Whether used to supplement limited data, protect sensitive information, or enable safe data sharing across teams, generating data allows organizations to design and build data products more efficiently and securely.
One approach that has gained prominence in recent years is called synthetic data. Models like Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs) help to generate data that mimics the statistical properties and patterns of real data, that can be further user for data product building tasks, for example model training or data product prototyping.
Code generation
Once the specification is finalized and the necessary data are gathered or generated, the development process can begin. When using AI for code generation, both data experts and business users will gain significant advantages. Here are some example applications of GenAI during development:
- Generating Data Transformations: Based on the data product requirements you can generate transformations such as data cleaning, aggregation, and normalization.
- Debugging: Errors in your code can be identified and fixed through automated debugging techniques. These techniques can analyze code logic and suggest improvements based on learned patterns and best practices.
- Quality Checks: Code generation for automated checks will ensure data integrity, validate inputs, and verify outputs, maintaining high data quality standards throughout the development lifecycle.
Large language models are also applied for these code generation tasks. They can understand complex instructions and generate code tailored to specific programming languages. Moreover, there are specialized models designed specifically for generating SQL or Python code, ensuring compatibility with specific database systems or software applications.
Conclusion: Why use AI in Data Product Building?
Integrating AI into your data product building process is not just a technological upgrade but a strategic move that can transform how your organization utilizes data. The main advantages of using AI for building data products are:
- Time and cost savings:
By streamlining processes and improving efficiency, AI reduces the time and costs associated with data product development, freeing up time for data teams to focus on strategic and complex projects. - Enhanced data product quality and insights:
AI helps to ensure high-quality data by identifying and correcting errors, leading to more reliable insights. - Lower barriers to entry for business users:
Employees with broad business knowledge but less development skills can also start building data products, fostering a more inclusive and productive data-driven culture within your organization.
One Data is designed to save your data team valuable time and empower business users to extract meaningful insights from data products—adding significant value to your organization. By leveraging AI, One Data automates repetitive tasks, improves quality, and provides intelligent recommendations.
How do you apply AI for building data products in your organization? Feel free to reach out to me on LinkedIn to discuss how AI can transform the process of building Data Products!
Author:
Aleksandra Ershova
Senior Product Manager, One Data
Aleksandra Ershova took on the role of a Product Manager at One Data five years ago. Here, she is focused on building data products. This shift allowed her to revisit the challenges she had encountered in the past as a business analyst from a new perspective. She is putting an emphasis on making data product development more efficient and less time-consuming—mainly by developing new AI features for One Data with her team.
Related content
New One Data Feature: Business Case Builder
The Business Case Builder enables teams to streamline the design, development, documentation, and monitoring of data products.
One Data Closes EUR 32 Million Series B Extension
One Data closes EUR 32 Million Series B Extension co-led by Vsquared Ventures, Molten Ventures, HV Capital & existing investors.
One Data Recognized as Sample Vendor For Data Products by Gartner
Learn more about One Data's recognition as a stand-alone vendor for data product creation at the Gartner D&A Summit 2024.