Tech companies operating within Hong Kong’s Smart Government Innovation Lab announced the roll-out of solutions that are now ready to be acquired by companies and institutions.
Solution I – Website Extraction Solution
Solution description
Regardless of their size, for all businesses, scraping the web for data to fuel their market research efforts offers the broadest and most insightful perspective of their respective industry. Manually acquiring data for market research is a mundane, arduous task – one, fortunately, easily automated by intelligently designed web crawlers.
To this end, a company under HK’s Smart Government Innovation Lab has developed a Website Extraction Solution that converts unstructured website data into structured ready-to-consume data. In this solution, a self-built data automation platform (called DataCanva) has been developed to scrap website information automatically, continuously and effortlessly, perform various data transformations and then output structured data ready for consumption through files, APIs and webhooks.
The Website Extraction Solution has a number of proprietary technologies to enable data crawling at scale even on difficult sites:
- Anti-ban: the technology has strategies to emulate a human visit session to avoid banning.
- Auto-queuing: Some sites have implemented auto queuing features but when the sites are overloading, this technology will enable the crawlers to queue up in a virtual waiting room just like a human.
- Login: While some sites require a valid credential and some session-related mechanics in order to load more data, the technology works seamlessly in these scenarios.
- Deep crawling: the technology does not only target web pages but also attachments such as WORD and PDF files.
- Natural Language Analysis: the technology can extract key phrases, key sentences and perform summarisation if needed.
- Data Change Detection: the technology extracts delta changes in data to minimise the data crawling workload and allow timely feedback.
- Rotational Proxy: the technology leverages a large pool of IP to decrease latency and improve success rate.
- Screen capture: the technology saves the screen in a PDF file for a historical snapshot of the website for future review.
Application Areas
The solution was developed to be applied in the areas of Broadcasting, City Management, Climate and Weather, Commerce and Industry, Development, Education, Employment and Labour, Environment, Finance, Food, Health, Housing, Infrastructure, Law and Security, Population, Recreation and Culture, Social Welfare as well as Transport.
Technologies Used
The solution uses the latest in Artificial Intelligence (AI), Cloud Computing, Data Analytics, Deep Learning, Machine Learning, Natural Language Processing as well as Predictive Analytics.
Use case
The Website Extraction Solution is suitable if the below use cases:
- Market trend analysis
- Price monitoring (e.g., on major E-commerce websites)
- Research and development
- Competitor analysis
- News/alerts monitoring (i.e., good for compliance monitoring)
- Profile analysis (i.e., retrieve data to enrich the user/company profile)
Solution II – Things of Artificial Intelligence (ToAI)
Solution description
ToAI is a unified platform that helps collect and prepare the data, builds, trains and deploys users’ ML models and monitors and automatically retrains them, offering performance at speed.
Application Areas
The solution was designed to be applied in the areas of City Management, Climate and Weather, Environment, Health, Housing, Infrastructure as well as Transport.
Technologies Used
The solution uses the latest in Artificial Intelligence (AI), Data Analytics, Internet of Things (IoT) as well as Machine Learning.
Use case
Smart Equipment – the solution provides an Internet of Things (IoT) platform to connect various sensory devices installed on different equipment, to monitor operation conditions and improve safety, efficiency, effectiveness and endurance.
Airconditioning Energy Optimisation – the solution also provides an Internet of Things (IoT) platform to connect various sensory devices installed across the office workspaces, to monitor and optimise aircon cooling operations.