Innovation and Technology
Parallel Paths to AI
The Debate Between Synthetic and Real-World Data
Self-driving cars and humanoid robots that can walk, talk, and work alongside us are just two of the amazing ways in which AI promises to change the world in the near future.
The Need for Understanding the World
In order to be able to operate safely and effectively, these physical AI tools and applications have to be able to understand the world.
NVIDIA’s Cosmos Platform
At this year’s Consumer Electronics Show in Las Vegas, Nvidia just announced the launch of its Cosmos platform, designed to accelerate the development of physical AI systems.
**Described as a "ChatGPT moment for robotics," Cosmos is capable of generating huge amounts of synthetic data. This is data that, despite being artificially created, is close enough to the real world that robots, self-driving cars, and other physical AI algorithms should be able to learn from it.
The Debate: Synthetic vs Real-World Data
However, some people believe that no amount of synthetic data will ever be able to fully simulate every real-world scenario that machines will need to be prepared for. This is why Tesla, for example, has spent many years collecting real-world data with its sensor-packed cars. CEO Elon Musk tweeted, “Two sources of data scale infinitely: synthetic data, which has an ‘is it true?’ problem and real-world video, which does not.”
Synthetic vs Real-World Data
In autonomous driving systems, visual data (pictures) are used to train algorithms that determine how vehicles will react to different conditions and situations on the road. This data can be captured with cameras attached to vehicles (real-world data). It can also be generated by AI algorithms according to rules learned from studying real-world data (synthetic data).
Advantages and Disadvantages
Synthetic data can often be collected much more quickly and cost-effectively than real-world data. No one has to actually go out and gather it – it is simply generated by machines. This can also have safety benefits. Testing self-driving cars on roads, for example, clearly comes with some element of risk, which can be eliminated if journeys are simply simulated.
Situations, environments, and many other variables can also be customized, rather than having to wait for the ideal circumstances to gather data to emerge in the real world. For example, researchers can simulate rare weather events, test autonomous vehicles in dangerous scenarios, or model complex manufacturing defects without real-world risks or delays.
Additionally, generating synthetic data can also reduce or eliminate concerns around privacy and data protection that might apply in the real world, as there’s no danger of sensitive personal data inadvertently being stored or compromised.
Real-world data, on the other hand, as Musk points out, has the undeniable advantage of being more authentic. Chaotic and hard-to-predict human behaviors that are difficult to generate synthetically are more likely to be accounted for in the data.
Weighing the Options
In truth, both real-world and synthetic data are likely to be vitally important to training the upcoming generation of physical AI vehicles and robots. Both offer distinct advantages and challenges and adopting a hybrid approach is likely to be the best path to success.
Conclusion
The trick will be identifying which is most appropriate for specific use cases. For example, it’s possible that synthetic data will be more useful for tasks or applications involving the processing of sensitive information or operating in dangerous conditions. Real-world data, on the other hand, might be best when it comes to capturing dynamic human behaviors, or there is a likelihood of encountering chaotic unforeseen events.
This means that AI projects that adopt a balanced approach, led by those who understand how synthetic and real-world information can complement rather than compete with each other, are more likely to create real business value.
FAQs
- What is the difference between synthetic and real-world data?
Synthetic data is artificially created, while real-world data is collected from actual events or scenarios. - What are the advantages of synthetic data?
Synthetic data can be collected more quickly and cost-effectively, and it can also reduce or eliminate concerns around privacy and data protection. - What are the advantages of real-world data?
Real-world data is more authentic and can capture chaotic and hard-to-predict human behaviors. - Can AI projects adopt a hybrid approach?
Yes, AI projects can adopt a hybrid approach, combining the benefits of both synthetic and real-world data to create the most effective training data.
-
Resiliency7 months agoHow Emotional Intelligence Can Help You Manage Stress and Build Resilience
-
Career Advice1 year agoInterview with Dr. Kristy K. Taylor, WORxK Global News Magazine Founder
-
Diversity and Inclusion (DEIA)1 year agoSarah Herrlinger Talks AirPods Pro Hearing Aid
-
Career Advice1 year agoNetWork Your Way to Success: Top Tips for Maximizing Your Professional Network
-
Changemaker Interviews1 year agoUnlocking Human Potential: Kim Groshek’s Journey to Transforming Leadership and Stress Resilience
-
Diversity and Inclusion (DEIA)1 year agoThe Power of Belonging: Why Feeling Accepted Matters in the Workplace
-
Global Trends and Politics1 year agoHealth-care stocks fall after Warren PBM bill, Brian Thompson shooting
-
Changemaker Interviews12 months agoGlenda Benevides: Creating Global Impact Through Music
