Search
Close this search box.

We are creating some awesome events for you. Kindly bear with us.

MIT’s CSAIL Enhancing Robotic Object Manipulation

Getting your Trinity Audio player ready...

Inspired by humans’ competence in dealing with unfamiliar items, a group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has created Feature Fields for Robotic Manipulation (F3RM). This innovative system seamlessly integrates 2D images with fundamental model attributes to build 3D scenes, thereby assisting robots in identifying and gripping objects in their vicinity. F3RM is distinguished by its proficiency in understanding unstructured language commands from humans, making it especially beneficial in practical scenarios with abundant objects, such as warehouses and households.

Image credits: news.mit.edu

F3RM endows robots with the capability to interpret and act on open-ended textual instructions expressed in natural language, thereby enhancing their object manipulation skills. Consequently, these machines can comprehend less specific human requests and accomplish the intended tasks. For instance, when a user instructs the robot to “pick up a tall mug,” the robot can efficiently identify and handle an item that best matches this description.

Ge Yang, a postdoc at the National Science Foundation AI Institute for Artificial Intelligence and Fundamental Interactions and MIT CSAIL, emphasises the challenge of creating robots capable of generalising in real-world scenarios. The objective is to equip robots with flexibility akin to humans, enabling them to grasp and position objects, even when encountering them for the first time.

This method could enhance robots’ ability to pick items in busy fulfilment centres characterised by clutter and unpredictability. Robots in such warehouses are often tasked with matching textual descriptions to objects, regardless of packaging variations, to ensure accurate order shipping.

For instance, in vast online retail fulfilment centres housing millions of items, many of which may be unfamiliar to robots, F3RM’s advanced spatial and semantic perception could help robots efficiently locate, place, and package items. This efficiency benefits factory workers and enhances order shipping.

Moreover, F3RM’s versatility extends to urban and household settings, where personalised robots can identify and pick specific items. The system aids robots in understanding their physical and perceptual surroundings.

Phillip Isola, MIT associate professor of electrical engineering and computer science and CSAIL principal investigator, highlights the combination of advanced visual recognition and radiance fields as highly beneficial for robotic tasks, particularly those involving 3D object manipulation in various environments.

F3RM initiates its spatial understanding process by capturing images through a selfie stick-mounted camera. This camera takes 50 images from various angles, facilitating the creation of a neural radiance field (NeRF). NeRF is a deep learning technique that transforms 2D images into a 3D scene. These RGB images collectively form a comprehensive “digital twin” representation, offering a 360-degree view of the surroundings.

In addition to the intricate neural radiance field, F3RM constructs a feature field to enhance geometric data with semantic insights. The system utilises CLIP, a vision foundation model trained on a vast image dataset, to grasp visual concepts efficiently. By translating the 2D CLIP features of the images captured by the selfie stick into a 3D format, F3RM effectively elevates these features into a three-dimensional representation.

After receiving a few demonstrations, the robot leverages its knowledge of geometry and semantics to grasp unfamiliar objects. When a user submits a text query, the robot explores various possible grasping options, selecting those with the highest likelihood of picking up the requested object. Each option’s score is based on its relevance to the prompt, its similarity to the robot’s training demonstrations, and whether it avoids collisions. The grasp with the highest score is then executed.

F3RM also allows users to specify the desired object in various levels of detail using natural language. For instance, if there is both a metal mug and a glass mug, the user can request the “glass mug.” Even when multiple identical objects are present, such as two glass mugs, one filled with coffee and the other with juice, the user can specify the “glass mug with coffee.” The feature field’s embedded foundation model features facilitate this open-ended understanding.

PARTNER

Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.

PARTNER

CTC Global Singapore, a premier end-to-end IT solutions provider, is a fully owned subsidiary of ITOCHU Techno-Solutions Corporation (CTC) and ITOCHU Corporation.

Since 1972, CTC has established itself as one of the country’s top IT solutions providers. With 50 years of experience, headed by an experienced management team and staffed by over 200 qualified IT professionals, we support organizations with integrated IT solutions expertise in Autonomous IT, Cyber Security, Digital Transformation, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Well-known for our strengths in system integration and consultation, CTC Global proves to be the preferred IT outsourcing destination for organizations all over Singapore today.

PARTNER

Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit www.planview.com.

SUPPORTING ORGANISATION

SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.

PARTNER

HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 

PARTNER

IBM is a leading global hybrid cloud and AI, and business services provider. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity and service.