DAta Research (DAR - در) Lab - Pakistan

Research Ready COVID-19

The project focuses on cleaning and preparing data collected by the Pakistan Bureau of Statistics to make it research-ready for assessing the socio-economic impact of COVID-19 on Pakistani households. The survey data, consisting of 31,204 observations from various provinces, initially faced challenges in terms of quality, consistency, and accessibility. The extensive data curation efforts have significantly improved the dataset, making it more accessible and actionable for researchers and policymakers seeking to understand the socio-economic impacts of the COVID-19 pandemic in Pakistan.

Methodology

The data cleaning process involved several critical steps to enhance its reliability and usability. First, multiple datasets were merged into a single cohesive file. Variables were systematically renamed and labeled using a standardized nomenclature to ensure clarity and ease of analysis. Binary responses were re-coded appropriately (0 & 1), and missing values were correctly identified and distinguished from zero values, which had been improperly recorded in the original data files provided by the PBS. To address incomplete variables, dummy variables were created by duplicating originals and imputing missing data wherever necessary.

Logic checks (assertions) were performed to validate the consistency of responses, such as verifying that age-specific questions were correctly answered and ensuring proper adherence to skip patterns in the survey using Stata. These checks helped to identify and correct errors, thereby enhancing the dataset’s overall quality.

Lastly, another significant value addition in this project includes the georeferencing of data to the division level, enabling a more localized analysis of the socio-economic impacts across different regions. Additionally, the inclusion of geographical and climate-related variables provides further insights into factors that may have influenced the spread of COVID-19, enhancing the robustness of our analysis and any research that might take place in the future.

Status

The first release of the research-ready dataset is expected to be available by the end of September 2024.

Team Members

Principal Investigator

Student Researchers

Go to: Top Data Research Lab - Pakistan