Search
Close this search box.

We are creating some awesome events for you. Kindly bear with us.

Pioneering Generative AI for Synthetic Data Solutions

Getting your Trinity Audio player ready...

Generative AI has become a focal point due to its capacity to produce text and images. Its impact is profound and multifaceted, shaping a wide array of scenarios, such as when a patient undergoes medical treatment, a storm disrupts flight schedules or an individual interacts with software applications.

Image credits: news.mit.edu

Leveraging generative AI to create realistic synthetic data in these scenarios can help organisations effectively address challenges like patient care, flight rerouting, or software platform improvement, particularly in cases where real-world data are scarce or sensitive.

For the past three years, MIT spinout DataCebo has offered the Synthetic Data Vault (SDV), a generative software system designed to assist organisations in creating synthetic data for software testing and machine learning model training.

SDV is an open-source library for generating synthetic tabular data that has been downloaded over 1 million times, with more than 10,000 data scientists utilising it. The founders, Principal Research Scientist Kalyan Veeramachaneni and alumna Neha Patki ’15, SM ’16, attribute the company’s success to SDV’s transformative impact on software testing.

In 2016, Veeramachaneni’s group at the Data to AI Lab introduced a suite of open-source generative AI tools to help organisations create synthetic data that mirrors the statistical properties of real data. Using synthetic data allows companies to protect sensitive information while maintaining the statistical relationships between data points. Additionally, synthetic data can be used to simulate new software performance before its public release.

The inspiration for SDV came from the group’s work with companies willing to share their data for research purposes. This diverse exposure across industries helped them realise the versatility of their tools.

In 2020, DataCebo was founded to enhance SDV features for larger organisations. Since then, the range of applications has been diverse. For instance, DataCebo’s flight simulator enables airlines to plan for rare weather events in ways previously impossible using only historical data. In another application, SDV users synthesised medical records to predict health outcomes for cystic fibrosis patients.

In 2021, Kaggle hosted a competition for data scientists using SDV to create synthetic datasets, attracting roughly 30,000 participants who built solutions and predicted outcomes based on the company’s realistic data.

Despite their tools being used for various purposes, DataCebo primarily focuses on expanding its presence in software testing. Veeramachaneni explained, “People need data to test these software applications. Traditionally, developers manually write scripts to create synthetic data. With generative models created using SDV, you can learn from a sample of data collected and then sample a large volume of synthetic data (which has the same properties as real data) or create specific scenarios and edge cases and use the data to test your application.”

Patki added, “It is common for industries to have sensitive data in some capacity. Hence, synthetic data is always better from a privacy perspective.”

DataCebo believes it is advancing the “synthetic enterprise data” field generated from user behaviour on large companies’ software applications.

Veeramachaneni emphasises that “Enterprise data of this kind is complex, and there is no universal availability, unlike language data. When folks use our publicly available software and report back if it works on a certain pattern, we learn many of these unique patterns, allowing us to improve our algorithms. From one perspective, we are building a corpus of these complex patterns, which for language and images is readily available.”

DataCebo has also released new features to enhance SDV’s utility, including the SDMetrics library to assess the “realism” of generated data and the SDGym to compare models’ performances.

Their tools aim to ensure organisations trust this new data. Veeramachaneni stated, “The tools over programmable synthetic data allow enterprises to insert their specific insight and intuition to build more transparent models.”

As companies across industries embrace AI and other data science tools, DataCebo is helping them do so transparently and responsibly. “In the next few years, synthetic data from generative models will transform all data work,” Veeramachaneni asserted. “We believe 90% of enterprise operations can be done with synthetic data.”

PARTNER

Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.

PARTNER

CTC Global Singapore, a premier end-to-end IT solutions provider, is a fully owned subsidiary of ITOCHU Techno-Solutions Corporation (CTC) and ITOCHU Corporation.

Since 1972, CTC has established itself as one of the country’s top IT solutions providers. With 50 years of experience, headed by an experienced management team and staffed by over 200 qualified IT professionals, we support organizations with integrated IT solutions expertise in Autonomous IT, Cyber Security, Digital Transformation, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Well-known for our strengths in system integration and consultation, CTC Global proves to be the preferred IT outsourcing destination for organizations all over Singapore today.

PARTNER

Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit www.planview.com.

SUPPORTING ORGANISATION

SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.

PARTNER

HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 

PARTNER

IBM is a leading global hybrid cloud and AI, and business services provider. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity and service.