For decades, synthetic data has existed in different forms. It may be found in computer games like flight simulators and in physics simulations that depict everything from atoms to galaxies. Now, synthetic data is being applied within industries such as healthcare to solve real-world AI challenges.
Synthetic data context
The advancement of AI continues to run into several implementation obstacles. Large data sets, for example, are required to deliver trustworthy findings, be free of bias, and adhere to increasingly stricter data privacy regulations. Amid these challenges, annotated data created by computerized simulations or programs have emerged as an alternative to genuine data. This AI-created data, known as synthetic data, is critical to resolving privacy concerns and eradicating prejudice since it can ensure data diversity that reflects the actual world.
Healthcare practitioners use synthetic data, as an example, within the medical images sector to train AI systems while maintaining patient confidentiality. The virtual care firm, Curai, for instance, used 400,000 synthetic medical cases to train a diagnosis algorithm. Furthermore, retailers such as Caper use 3D simulations to create a synthetic dataset of a thousand photographs from as little as five product shots.
According to a Gartner study released in June 2021 focused on synthetic data, most of the data utilized in AI development will be artificially manufactured by legislation, statistical standards, simulations, or other means by 2030.
Synthetic data aids in the preservation of privacy and the prevention of data breaches. For example, a hospital or corporation may offer a developer high-quality synthetic medical data to train an AI-based cancer diagnosis system—data that is as complex as the real-world data this system is meant to interpret. In this way, the developers have quality datasets to use when designing and compiling the system, and the hospital network does not run the risk of endangering sensitive, patient medical data.
Synthetic data can further allow buyers of testing data to access information at a lower price than traditional services. According to Paul Walborsky, who co-founded A.I. Reverie, one of the first dedicated synthetic data businesses, a single image that costs $6 from a labeling service can be artificially generated for six cents. Conversely, synthetic data will pave the way for augmented data, which entails adding new data to an existing real-world dataset. Developers could rotate or brighten an old image to make a new one.
Lastly, given privacy concerns and government restrictions, personal information existing in a database is becoming increasingly legislated and complex, making it harder for real-world information to be used to create new programs and platforms. Synthetic data could provide developers with a workaround solution to replace highly sensitive data.
Implications of synthetic data
Wider implications of synthetic data include:
- The accelerated development of new AI systems, both in scale and diversity, that improve processes in numerous industries and fields of discipline.
- Enabling organizations to share information more openly and teams to collaborate and operate more efficiently.
- Developers and data professionals being able to email or carry large synthetic data sets on their laptops, safe in knowing that critical data is not being endangered.
- The reduced frequency of database cybersecurity breaches, as authentic data will no longer need to be accessed or shared as often.
- Governments gaining more freedom to implement stricter data management legislation without worrying about impeding industry development of AI systems.
Questions to comment on
- What other industries could benefit from synthetic data?
- What regulations should the government implement concerning how synthetic data is created, used, and deployed?