Close this search box.

We are creating some awesome events for you. Kindly bear with us.

Reducing Bias in Training Data for Improved Machine Learning

As companies and decision-makers increasingly look to machine learning to make sense of large amounts of data, ensuring the quality of training data used in machine learning problems is becoming critical. That data is coded and labelled by human data annotators—often hired from online crowdsourcing platforms—which raises concerns that data annotators inadvertently introduce bias into the process, ultimately reducing the credibility of the machine learning application’s output.

A team of U.S. researchers has developed a new scientific method to screen human data annotators for bias, ensuring high-quality data inputs for machine learning tasks. The researchers have also designed an online platform that allows for scaling up the screening process.

We have created a very systematic and scientific method for finding good data annotators. This much-needed approach will improve the outcomes and realism of machine learning decisions around public opinion, online narratives and perception of messages.

– Lead Researcher

They investigated how five common attitudes and knowledge measures in Brexit could be combined to create an anonymized profile of data annotators who are likely to label data used for machine learning applications in the most accurate, bias-free way. They tested 100 prospective data annotators from 26 countries using several thousand social media posts from 2019.

The lead researcher stated that they wanted to use machine learning to detect what people were talking about. In the case of their study, are they talking about Brexit in a positive or negative way? Are data annotators likely to label data as only reflecting their beliefs about leaving or staying in the EU because their bias clouds their performance? Data annotators who can put aside their own beliefs will provide more accurate data labels, and our research helps find them.

The team’s method is scalable in two ways. First, it cuts across domains, impacting data quality for machine learning problems related to transportation, climate and robotics decisions in addition to health care and geopolitical narratives relevant to national security. Second, the team’s open-source interactive web-based platform, scales up the measurement of attitudes and beliefs, allowing for profiling of larger groups of prospective data annotators and faster identification of the best hires.

This research strongly indicates that data annotators’ morals, prejudices and prior knowledge of the narrative in question significantly impact the quality of labelled data and, consequently, the performance of machine learning models. Machine learning projects that rely on labelled data to understand narratives must qualitatively assess their data annotators’ worldviews if they are to make definitive statements about their results.

As reported by OpenGov Asia, To reduce bias in AI algorithms, U.S. researchers have developed a new Artificial Intelligence(AI) programming language that can assess the fairness of algorithms more exactly, and more quickly, than available alternatives. Their Sum-Product Probabilistic Language (SPPL) is a probabilistic programming system.

Probabilistic programming is an emerging field at the intersection of programming languages and artificial intelligence that aims to make AI systems much easier to develop, with early successes in computer vision, common-sense data cleaning, and automated data modelling.  Probabilistic programming languages make it much easier for programmers to define probabilistic models and carry out probabilistic inference — that is, work backwards to infer probable explanations for observed data.

SPPL gives fast, exact solutions to probabilistic inference questions. These inference results are based on SPPL programmes that encode probabilistic models of what kinds of applicants are likely, a priori, and also how to classify them. Fairness questions that SPPL can answer include “Is there a difference between the probability of recommending a loan to an immigrant and nonimmigrant applicant with the same socioeconomic status?” or “What’s the probability of a hire, given that the candidate is qualified for the job and from an underrepresented group?”.


Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.


CTC Global Singapore, a premier end-to-end IT solutions provider, is a fully owned subsidiary of ITOCHU Techno-Solutions Corporation (CTC) and ITOCHU Corporation.

Since 1972, CTC has established itself as one of the country’s top IT solutions providers. With 50 years of experience, headed by an experienced management team and staffed by over 200 qualified IT professionals, we support organizations with integrated IT solutions expertise in Autonomous IT, Cyber Security, Digital Transformation, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Well-known for our strengths in system integration and consultation, CTC Global proves to be the preferred IT outsourcing destination for organizations all over Singapore today.


Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit


SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.


HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 


IBM is a leading global hybrid cloud and AI, and business services provider. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity and service.