Close this search box.

We are creating some awesome events for you. Kindly bear with us.

MIT’s Curiosity-Driven Approach to AI Ethics

Getting your Trinity Audio player ready...

In today’s rapidly evolving digital landscape, deploying AI technologies, particularly large language models (LLMs) like Generative AI, brings forward a new set of challenges, primarily centred around ensuring these models’ safety and ethical use. Researchers from MIT and the MIT-IBM Watson AI Lab are pioneering efforts to address these challenges through an advanced red-teaming process utilising machine learning, which marks a significant stride in AI safety and digital technology.

Traditionally, AI models, especially those based on vast amounts of text data from the internet, inherit not only the vast knowledge embedded in these texts but also their biases and potential for misuse. There is a real risk that these models could inadvertently generate harmful or toxic content, posing serious ethical concerns. As AI models become more integrated into daily applications, from customer service bots to advanced analytical tools, ensuring their safety becomes paramount.

Red-teaming is a standard practice where human testers try to ‘break’ the AI by prompting it to produce inappropriate outputs. However, the effectiveness of this method is often limited by the testers’ ability to predict and simulate every possible inappropriate prompt, a near-impossible task given the model’s potential to generate a vast array of responses based on its training data.

To overcome human-led red-teaming limitations, the MIT team has developed an automated approach that leverages a curiosity-driven reinforcement learning framework. This method trains a secondary AI model to act as the ‘red team,’ tasked with challenging the primary AI model’s ability to generate safe responses.

This red-team model is programmed to be ‘curious,’ meaning it constantly seeks novel prompts to which the primary model might respond toxically. This is a significant shift from traditional reinforcement learning, which might trap the model in generating repeated or highly similar toxic prompts to maximise the reward from triggering unsafe responses.

The core technology enabling this advancement is deeply rooted in the latest advancements in machine learning and AI. By applying reinforcement learning, the researchers gamify the red-teaming process. The red-team AI receives rewards for finding a toxic response and discovering it through novel and varied prompts. This approach is enhanced by two types of novelty rewards: one for lexical variety and another for semantic diversity.

Moreover, to avoid nonsensical or irrelevant prompts, the researchers incorporated a natural language bonus that encourages the red-team model to maintain logical coherence in its queries. This ensures the prompts remain realistic and relevant, mirroring potential human interactions more closely.

The implications of this technology extend beyond just creating safer AI. In environments where AI models must be updated frequently, such as those dealing with real-time information or evolving datasets, traditional red-teaming becomes a bottleneck due to its time-consuming nature. The automation and efficiency brought by this curiosity-driven approach accelerate the process and enhance the depth and breadth of safety testing.

This method also significantly reduces the human resources required for AI safety testing, allowing experts to focus on higher-level strategy and oversight rather than routine testing. Furthermore, the flexibility of this approach means it can be adapted to different AI applications or compliance needs, such as testing for compliance with company policies or legal standards.

The research team at MIT is exploring further enhancements to this technology. They are looking to enable the red-team model to generate prompts across a wider variety of topics and to refine the model’s ability to simulate real-world scenarios even more accurately.

Another promising avenue is developing a large language model that could serve as the toxicity classifier, which could be trained specifically to reflect particular organisations’ ethical guidelines or operational requirements.

The pioneering work by the Improbable AI Lab and the MIT-IBM Watson AI Lab sets new standards for AI safety in the digital age. Integrating advanced machine learning techniques with red-teaming addresses one of the most critical challenges in deploying AI systems—ensuring that these technologies operate within safe and ethical boundaries.


Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.


CTC Global Singapore, a premier end-to-end IT solutions provider, is a fully owned subsidiary of ITOCHU Techno-Solutions Corporation (CTC) and ITOCHU Corporation.

Since 1972, CTC has established itself as one of the country’s top IT solutions providers. With 50 years of experience, headed by an experienced management team and staffed by over 200 qualified IT professionals, we support organizations with integrated IT solutions expertise in Autonomous IT, Cyber Security, Digital Transformation, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Well-known for our strengths in system integration and consultation, CTC Global proves to be the preferred IT outsourcing destination for organizations all over Singapore today.


Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit


SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.


HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 


IBM is a leading global hybrid cloud and AI, and business services provider. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity and service.