Close this search box.

We are creating some awesome events for you. Kindly bear with us.

The Development of Machine Learning Model that Understands Object Relationships

Many deep learning models struggle to see the world in which there are objects and the relationships between them. Most models do not understand the entangled relationships between individual objects. Without knowledge of these relationships, a robot designed to assist someone in a kitchen would have difficulty following commands such as “grab the spatula on the left side of the stove and place it on the cutting board.”

In an effort to solve this problem, MIT researchers have developed a model that understands the underlying relationships between objects in a scene. Their model depicts individual relationships one by one and combines these representations to describe the overall scene. This allows the model to generate more accurate images from text descriptions, even when the scene contains multiple objects arranged in different relationships to each other.

This work can be applied in situations where industrial robots need to perform complex, multi-step manipulation tasks, such as stacking items in a warehouse or assembling devices. It also brings the field one step closer to enabling machines that can learn from and interact with their environment more like humans do.

When I look at a table, I cannot tell there is an object in the XYZ location. Our minds do not work that way. In our minds, when we understand a scene, we really understand it based on the relationships between the objects. We think that by building a system that can understand the relationships between objects, we can use that system to more effectively manipulate and change our environments.

– Yilun Du, PhD Computer Science and Artificial Intelligence Laboratory & Co-Lead Author

The framework the researchers developed can generate an image of a scene based on a text description of objects and their relationships, such as ‘A wooden table to the left of a blue stool. A red bench to the right of a blue stool.”

Their system would break these sentences into two smaller pieces describing each individual relationship, then model each part individually. Those pieces are then combined through an optimisation process that generates an image of the scene.

The researchers used a machine learning technique called energy-based models to represent the individual object relationships in a scene description. This technique allows them to use one energy-based model to encode each relational description, then assemble them in a way that infers all objects and relationships.

The system also works in reverse: with an image, it can find text descriptions that correspond to the relationships between objects in the scene. In addition, their model can be used to edit an image by rearranging the objects in the scene to match a new description.

The researchers compared their model with other deep learning methods that were given text descriptions and tasked with generating images showing the associated objects and their relationships. In any case, their model outperformed the baselines.

They also asked people to evaluate whether the images generated matched the original scene description. In the most complex examples, where descriptions included three relationships, 91% of participants concluded that the new model performed better.

While these initial results are encouraging, the researchers would like to see how their model performs on more complex real-world images, with noisy backgrounds and objects blocking each other. They are also interested in eventually incorporating their model into robotic systems, allowing a robot to derive object relationships from videos and then apply this knowledge to manipulate objects in the world.


Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.


CTC Global Singapore, a premier end-to-end IT solutions provider, is a fully owned subsidiary of ITOCHU Techno-Solutions Corporation (CTC) and ITOCHU Corporation.

Since 1972, CTC has established itself as one of the country’s top IT solutions providers. With 50 years of experience, headed by an experienced management team and staffed by over 200 qualified IT professionals, we support organizations with integrated IT solutions expertise in Autonomous IT, Cyber Security, Digital Transformation, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Well-known for our strengths in system integration and consultation, CTC Global proves to be the preferred IT outsourcing destination for organizations all over Singapore today.


Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit


SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.


HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 


IBM is a leading global hybrid cloud and AI, and business services provider. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity and service.