Close this search box.

We are creating some awesome events for you. Kindly bear with us.

Māori loanwords project becomes easier with machine learning

A machine learning model was used by researchers from the University of Waikato, in New Zealand, to narrow down a massive 8 million tweets to a more manageable 1.2 million in order to look at how te reo Māori is being used in the genre.

According to a recent press release, the team focused on 77 Māori loanwords, or te reo Māori words used in an English context, and used them as training data for their machine learning model.

Machine learning allows data scientists to provide a computer with a large data set, and teach it to make predictions based on that data.

The initial 8 million tweets contained a fair bit of distracting data ‘noise’.  The irrelevant tweets are those that are not used in a New Zealand English context, or were otherwise unrelated.

At first, the team manually coded about four thousand tweets then trained a machine learning model to weed out the irrelevant ones.

After which, they used a machine learning technique, invented by a popular search engine multinational company, to automatically extract the meaning of words according to their context.

There is a plan to grow this project into a dissertation, wherein the team will be asking some questions regarding the data they have gathered.

The team is interested to know if the people who tweeted are mainly te reo speakers and if not, then they want to what is the reason behind their use of the loanwords.

Their analysis involves locating the other words that are associated with the Māori loanwords because it will give them a different kind of idea about how the words are being used.

In a dictionary, they tend to get what the word means, abstractly out of context, or with a synonym or two.

But in this case, they have more of a network of related words, which may not have the same meaning but seem to occur in the same contexts.

One of the researchers, Dr Calude, has been involved in research on Māori loanwords in newspapers as part of a Marsden funded project.

The Marsden Fund is the primary mechanism in New Zealand for funding pure research, which is undertaken solely to increase knowledge.

She has already noticed a difference with the use in tweets during the manual coding phase. By comparison, the words are more integrated.

More language mixing is done, which means full sections of Māori and full phrases in English together. Hence, it is similar to code switching, which is what bilinguals do.

The theory around the project has been around for quite a while, but combining it with machine learning means they have created a remarkably vast and accurate corpus of words to analyse.

The researchers want to make it possible for others to do the same, so they are providing the knowledge on an open-source platform found here.

They are adding to the website as they go along, so it is a growing resource.


Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.


As a Titanium Black Partner of Dell Technologies, CTC Global Singapore boasts unparalleled access to resources.

Established in 1972, we bring 52 years of experience to the table, solidifying our position as a leading IT solutions provider in Singapore. With over 300 qualified IT professionals, we are dedicated to delivering integrated solutions that empower your organization in key areas such as Automation & AI, Cyber Security, App Modernization & Data Analytics, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Renowned for our consulting expertise and delivering expert IT solutions, CTC Global Singapore has become the preferred IT outsourcing partner for businesses across Singapore.


Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit


SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.


HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 


IBM is a leading global hybrid cloud and AI, and consulting services provider, helping clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,800 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently, and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity, and service. For more information, visit