Close this search box.

We are creating some awesome events for you. Kindly bear with us.

Advancing Object Recognition in Autonomous Driving

Image credits:
Getting your Trinity Audio player ready...

MIT and a collaborative research initiative have developed an efficient computer vision model that enables autonomous vehicles to rapidly and accurately recognise objects, even in high-resolution images. This model reduces computational complexity, allowing real-time semantic segmentation on devices with limited hardware resources, like those used in autonomous vehicles, for quick decision-making.

Recent cutting-edge semantic segmentation models directly capture pixel interactions in images, causing computational demands to increase exponentially with image resolution, limiting real-time processing on edge devices like sensors or mobile phones.

MIT researchers have introduced a novel building block for semantic segmentation models, matching the capabilities of models but with linear computational complexity and hardware-friendly operations. As a result, this new model series enhances high-resolution computer vision, achieving up to nine times faster performance on mobile devices while maintaining or surpassing accuracy levels.

Beyond aiding real-time decisions in autonomous vehicles, this technique has potential applications to enhance efficiency in other high-resolution computer vision tasks, including medical image segmentation.

According to Song Han, a senior author of the paper and an associate professor in the Department of Electrical Engineering and Computer Science (EECS) at MIT, although traditional vision transformers have been in use for a significant period and yield impressive results, they aim to draw attention to the efficiency dimension of these models. Their research demonstrates the feasibility of significantly reducing computational requirements, enabling real-time image segmentation to occur locally on a device.

Han acknowledged that categorising each pixel in a high-resolution image with potentially millions of pixels poses an intricate challenge for machine learning models. He explained that recently, a highly effective model called a vision transformer has emerged, proving its effectiveness in addressing this challenge.

Initially designed for natural language processing, transformers represent each word in a sentence as a token and then create an attention map to capture the relationships between all permits, facilitating contextual understanding during predictions. Similarly, a vision transformer applies this concept to images by dividing them into patches of pixels and encoding each patch into a token, subsequently generating an attention map.

This attention map relies on a similarity function to directly learn pixel interactions, resulting in a global receptive field that allows the model to access all relevant parts of the image. However, when dealing with high-resolution images comprising millions of pixels organised into thousands of patches, the attention map becomes exceedingly large, leading to quadratic growth in computational demands as image resolution increases.

In their novel model series, known as EfficientViT, the MIT researchers simplified the creation of the attention map by substituting the nonlinear similarity function with a linear one. This alteration allowed them to rearrange the order of operations, reducing the overall computational workload without altering functionality and sacrificing the global receptive field. Consequently, their model exhibits linear growth in computation requirements as image resolution increases.

However, this linear attention approach primarily captures global image context, leading to a decline in accuracy due to the loss of local information. The researchers integrated two additional components into their model to address this accuracy loss, each incurring minimal computational overhead. One of these elements assists in capturing local feature interactions, compensating for the linear function’s limitations in local information extraction.

The second element, a module enabling multiscale learning, facilitates recognising large and small objects. The researchers emphasised the delicate balance between performance and efficiency in their design. EfficientViT is engineered with a hardware-friendly architecture, making it suitable for deployment on various devices, such as virtual reality headsets and edge computers in autonomous vehicles. Moreover, this model can be applied to diverse computer vision tasks, including image classification.

In tests on semantic segmentation datasets, the researchers found their model, EfficientViT, performed up to nine times faster on Nvidia GPUs compared to other popular vision transformer models, maintaining or surpassing accuracy. This advancement enables the model to run efficiently on mobile and cloud devices. The researchers plan to extend this technique to accelerate generative machine-learning models and develop EfficientViT for various vision-related tasks.


Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.


As a Titanium Black Partner of Dell Technologies, CTC Global Singapore boasts unparalleled access to resources.

Established in 1972, we bring 52 years of experience to the table, solidifying our position as a leading IT solutions provider in Singapore. With over 300 qualified IT professionals, we are dedicated to delivering integrated solutions that empower your organization in key areas such as Automation & AI, Cyber Security, App Modernization & Data Analytics, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Renowned for our consulting expertise and delivering expert IT solutions, CTC Global Singapore has become the preferred IT outsourcing partner for businesses across Singapore.


Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit


SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.


HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 


IBM is a leading global hybrid cloud and AI, and consulting services provider, helping clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,800 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently, and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity, and service. For more information, visit