The Unsung Hero: How Data Engineering Powers Insights for Public Health in Singapore
When we think about artificial intelligence (AI) and machine learning (ML), cutting-edge models, complex algorithms, and dazzling analytics usually come to mind. However, these eye-catching aspects rely on a fundamental, often overlooked foundation: data engineering. Without a solid data engineering framework, AI and ML projects are like skyscrapers built on quicksand – impressive looking but unlikely to stand the test of time.
Data engineering is akin to the plumbing that keeps our homes functioning. While not as visible as the taps and showers it supports, it is crucial for ensuring smooth operations. Similarly, data engineering ensures that messy, raw data is cleaned, organised, and made usable. This is the foundation that allows data scientists and analysts to create insights and build transformative solutions.
GovTech’s Data Engineering Practice (DP) team plays a critical role in enabling impactful public sector projects. A prime example of this is its partnership with the Health Promotion Board (HPB) to develop the Population Health Data Hub (PHDH). This collaboration demonstrates how good data engineering transforms ambitious ideas into practical tools that drive meaningful outcomes.
Strengthening public health through data
With the country’s population hitting 6 million and continuing to climb, efforts to ensure a healthy society have become even more urgent. HPB’s initiatives across different health domains, such as nutrition, physical activity, mental well-being and health screening exemplify the government’s commitment to fostering healthier lifestyles. An added bonus: they also generate valuable data to deepen our understanding of citizens’ lifestyles.
Combining information on citizens’ demographics, participation rates, and behavioural trends can help identify citizen segments in greater need of behaviour changes. This data allows the organisation to identify gaps and fine-tune its strategies that promote early detection, encourage healthy lifestyles, support smoking cessation, and nudge vaccination uptake, ultimately enhancing community health and reducing the prevalence of preventable diseases.
Laying the groundwork for transformation
HPB wants to make better use of the information it gathers. To do this, the agency plans to improve its data systems and train its staff in data handling. This will help HPB understand citizens' needs more deeply and provide more tailored services, while still protecting people's privacy. The goal is to boost HPB's ability to analyse data and build up its own team of experts. The initial partnership between HPB and DP spanned six months and focused on creating a strong data foundation for the PHDH, so that solutions implemented would remain effective in the long term.
The first step was to clean and transform HPB’s raw data, applying validation metrics defined by HPB so that the data met high standards of accuracy and reliability. The DP team also conducted an assessment of HPB’s existing and planned data models, identifying areas for improvement and offering recommendations to enhance the organisation’s data architecture.
To future-proof the system, DP developed reusable data pipeline templates that ensured consistency across datasets. These templates not only minimised the need for bespoke scripts for greater efficiency, but also served as a blueprint for future data modelling efforts. In addition, the team designed a structured testing framework that helped identify discrepancies early, reducing errors and improving overall efficiency.
The collaboration was a success. It sped up the process of collecting health screening information and allowed for more thorough and up-to-date analysis of how health conditions relate to lifestyle choices. This gave HPB a ready-to-use model that can handle larger and more complex sets of information. This laid a strong foundation for the next phase.
Expanding and refining the system
Encouraged by the success of their first engagement, HPB extended its collaboration with GovTech’s DP team for another eight months to expand the scope of the PHDH and enhance its capabilities.
The DP team worked closely with HPB’s business teams to refine data models and develop additional pipelines. Early and frequent engagement with stakeholders ensured that requirements were well-understood, while proactive resolution of discrepancies improved the quality of the final outputs. By leveraging the templatised scripts developed in the first phase, the team was able to scale the system more quickly and efficiently.
For instance, the team worked closely with the Youth Preventive Service teams on data models and the data transformation logic to ensure that data assets produced were indeed practical for real-world use and met the needs of end users.
Advanced techniques were also introduced to handle complex data scenarios. The DP team implemented Slowly Changing Dimensions (SCDs), which allow organisations to track changes in data over time without compromising historical records. These enhancements were achieved using PySpark functions, ensuring that the system remained both robust and scalable.
Unlocking the potential of health data
With access to more accurate and detailed information, HPB is now better equipped to analyse population health trends, identify key risk factors, and design targeted interventions. For example, the data hub enables the organisation to develop preventive care strategies tailored to specific demographics, improving the effectiveness of its programmes.
The benefits extend beyond technical capabilities. By working closely with GovTech’s DP team, HPB staff gained valuable hands-on experience in data architecture and data modelling processes. This collaboration not only enhanced their technical skills but also fostered a culture of data literacy within the organisation.
Building a culture of collaboration
The partnership between HPB and GovTech underscores the importance of collaboration in data engineering projects. By focusing on the fundamentals – cleaning, organising, and structuring data – GovTech’s DP team laid the groundwork for HPB to achieve its goals.
As Singapore continues to embrace technology to address its most pressing challenges, the role of data engineering will only grow in importance. Through projects like the Population Health Data Hub, GovTech and its partners are proving that when data is managed right, the possibilities are endless.
Transforming health data: GovTech's Data Engineering Practice and Health Promotion Board join forces
From left to right :
Danush Aaditya, Yap Ghim Eng, Winston Lai, Neo Yi Lin, Albert Goh, Lim Kim Tee, Yap Lee Chen, Melissa Yeo, Ryan Lim, Joshua Na (Absent: Erwin, Andy Lam)
The Data Engineering Practice (DP), one of the practices within the Government Technology Office of GovTech, transfers data engineering expertise to the public sector, helping agencies improve their data quality, availability and accessibility.
At DP, we are focused on three key areas to deliver great products for the Digital Government:
1. Technology Exploration: Conduct proof of concepts and testing emerging technologies
2. Innovation: Implement double loop learning and provide expert consulting for use case
3. Standardisation: Set standards, guides and frameworks for whole-of-government
Want to stay updated on our latest tech innovations and co-create with us?
Be part of Singapore's digital transformation journey – stay informed and make a difference!