Close this search box.

We are creating some awesome events for you. Kindly bear with us.

Machine Learning for More Efficient Hash Functions

Machine learning has helped researchers from MIT and elsewhere to explore the possibility of building a better hash function. Their findings reveal how database searches can be optimised with a custom-designed hash function.

Researchers discovered that data collisions might be reduced by employing trained models instead of standard hash functions. Learned models are produced by applying a machine-learning algorithm to a dataset to identify key features. The trials performed by MIT researchers and elsewhere also showed that learnt models were frequently more computationally efficient than ideal hash functions.

“In this study, we discovered that there are circumstances in which it is possible to find a more optimal compromise between the time required to compute the hash function and the likelihood of collisions. In these cases, the computation time for the hash function can be increased a little. Still, at the same time, its collisions can be significantly reduced,” Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the paper’s co-lead authors asserted.

Because hashing is used in many contexts, including database indexing, data compression and cryptography, hash functions must be quick and efficient. Several online databases use hashing, such as library catalogues and e-commerce websites. Codes produced by a hash function indicate the data’s potential storage location. It is hence less demanding to seek out and get the information when employing these codes.

Traditional hash algorithms create codes arbitrarily. Therefore, two data bits can have the same hash, which leads to collisions. The collisions occur when a user tries to find a piece of specific information but receives results for many files with the same hash value. It takes a lot more time to zero down on the correct one, which slows down searches and decreases performance.

Perfect hash functions are a class of hashing algorithms optimised to insert data in a way that eliminates the possibility of collisions. However, they are labour-intensive to build for each dataset and slower to calculate than regular hash functions. With this new information, it should be possible to decrease the number of accidents. Thus, the method might speed up computing systems used by scientists to store and evaluate biological information like DNA, amino acid sequences, and so on.

Learned models may reduce the proportion of collisions in a dataset from 30% to 15% when data are distributed reliably, compared to conventional hash functions. They even managed to outperform ideal hash algorithms in terms of performance. Learned models can cut execution time by as much as 30% in the best circumstances.

Throughput was shown to be primarily affected by the total number of sub-models when researchers investigated the usage of trained models for hashing. Each trained model is made up of several simpler linear models, each of which provides an approximation of some portion of the data distribution. The learnt model’s approximation improves with additional sub-models, albeit at the cost of increased processing time.

A minimum number of sub-models must be used to construct the approximation required for the hash function. As a result, Sabek believes that the benefits of this approach of reducing collisions will plateau beyond a certain point.

Researchers aim to extend this work by applying learnt models to create hash functions for new data classes. The group also intends to investigate learnt hashing for transactional databases. This type of data update necessitates a model revision. However, revising a model without sacrificing accuracy is challenging.

“We’d want to inspire the community to include machine learning into their standard algorithms and data structures. Then, we can apply machine learning to capture data attributes better and achieve higher performance with virtually any fundamental data structure. “There is still a lot we can investigate,” Sabek added.


Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.


As a Titanium Black Partner of Dell Technologies, CTC Global Singapore boasts unparalleled access to resources.

Established in 1972, we bring 52 years of experience to the table, solidifying our position as a leading IT solutions provider in Singapore. With over 300 qualified IT professionals, we are dedicated to delivering integrated solutions that empower your organization in key areas such as Automation & AI, Cyber Security, App Modernization & Data Analytics, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Renowned for our consulting expertise and delivering expert IT solutions, CTC Global Singapore has become the preferred IT outsourcing partner for businesses across Singapore.


Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit


SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.


HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 


IBM is a leading global hybrid cloud and AI, and consulting services provider, helping clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,800 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently, and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity, and service. For more information, visit