# I. Introduction

In this study, we test whether demographic characteristics, socio-economic factors and public policy parameters are significant determinants of new COVID-19 cases. Our empirical framework follows a multivariate negative binomial regression model, covering the 10 countries having the highest number of confirmed cases of the virus per million people.

Several studies have shown that pre-existing health conditions and non-communicable diseases (NCDs) increase both the incidence of the virus and related mortality rate (Banik et al., 2020; Chan et al., 2020; Mathur & Rangamani, 2020). Singh and Misra (2020) show, in a meta-analysis from the pooled studies of China, USA, and Italy, that the severity of COVID-19 is associated with other non-communicable diseases.

Although COVID-19 affects all age groups, people above the age of 65 are more vulnerable to the infection than the younger population. However, obesity and smoking habits among younger population make them equally vulnerable to COVID-19 (WHO, 2020a). Regions with increased smoking habits are found to have increased COVID-19 cases even among younger population (Yu, 2020). Apart from the NCDs and age-related factors, government pharmaceutical and non-pharmaceutical policies also affect the incidence of COVID-19 (Hale et al., 2020; OECD, 2020). Allel *et al*. (2020) find that a delay in the government lockdown responses significantly affected the incidence rate ratios (IRR) of COVID-19 cases. Health care capacity of a country is another important determinant of pandemic preparedness (Chaudhry et al., 2020; Khan et al., 2020; Kraef et al., 2020; Mbunge, 2020). Health inequalities due to inadequate health capacity, directly affect vulnerable people (Bambra et al., 2020).

Inspired by the above findings, this study helps to understand the country-specific factors and government responses in the countries with the highest number of confirmed cases. The study period is divided into 2 parts: the first part covers March to May 2020 (complete/partial lockdown in most countries) and the second part covers June to September 2020 (lockdown relaxed in most countries). The study is carried out for top 10 countries which have the greatest number of active COVID-19 cases per million people, as of 30^{th} September 2020. We examine the country-specific effects of social, demographic, and health-related risk factors, along with government measures to contain the spread of COVID-19, on the number of new COVID-19 cases, for the 2 different time periods distinguished by country lockdowns.

Our research focuses on the determinants of the incidence of COVID-19 and is inspired by the variation in the incidence of the virus, both within and across countries. The COVID-19 virus had spread to most parts of the world by March 2020. The World Health Organisation (WHO) declared Europe as the new epicentre of the virus on 13^{th} March, 2020, and more than 1 million people were affected worldwide by 4^{th} April 2020 (WHO, COVID-19 Dashboard, World Health Organization, 2020b). Persons with pre-existing health conditions were deemed to be the most vulnerable and have had the highest mortality rate due to COVID-19. This study provides an understanding of the gaps in government policies in the countries with the highest virus cases, by dividing the pandemic period into pre- and post-lockdown phases. Moreover, by emphasising the role of NCDs, age, population density, and human development index, we consider the demographic and socio-economic factors along with the government policies. This is one of our contributions to the literature particularly from a policy point of view given that policy design is dictated by understanding determinants of the virus over time. To-date, there is no study that covers all these aspects in the manner we do. We find that demographic factors and government policies influence the incidence of new cases while socio-economic factors have a limited role. In the next section, data and methodology are discussed, followed by empirical results and analysis in section III and, finally, conclusion and policy implications in section IV.

# II. Methodology and Data

## A. Methodology

The association between the country-level factors and the number of new COVID-19 cases can be studied by fitting multivariate negative binomial regression models for both the pre- and post-lockdown sub-samples. This helps us understand the most relevant factors in explaining the variation in the number of new cases. A negative binomial regression is used when the dependent variable is a count variable with non-negative integers. Negative binomial regression is a generalised Poisson regression that relaxes ‘the variance equal to the mean’ assumption made by the Poisson model.

In a negative binomial regression, the mean of the dependent variable is determined by the exposure time, *t*, and a set of regressor variables, and each regressor variable has the same length of observation time. The slope coefficients can be estimated with consistency provided that the conditional mean is specified correctly, and the standard errors obtained are robust to possible misspecification of the distribution. This is like linear regression, except that consistent estimates and robust inferences can be obtained even when the normality assumption does not hold (Cameron & Trivedi, 2013).

Due to large variation in the number of confirmed new COVID-19 cases per thousand people (*CC*), within countries and regions, the distribution of *CC* is over-dispersed (OECD, 2020). In this case, the Poisson distribution is inadequate to fit the model as it has only one parameter, \(\mu\), and also requires variance-mean equality assumption. Negative binomial regression is appropriate for over-dispersed count data, that is, when the conditional variance exceeds the conditional mean. The traditional negative binomial regression model is given by

\[ \begin{align} \ln{{(\mu}_{CC})}_{t} =& \ \beta_{0} + \beta_{1}x_{1t} + \beta_{2}x_{2t} + \ldots + \beta_{p}x_{it}\ln{{(\mu}_{CC})}_{i} \hspace{8mm}(1)\\ =& \ \beta_{0} + \beta_{1}x_{1t} + \beta_{2}x_{2t} + \ldots + \beta_{p}x_{it} + {\alpha_{i} + \ €}_{it} \end{align} \]

where \(\mu_{CC}\) is the conditional mean of the dependent variable \(CC\); the predictor variables \(x_{1t},\ x_{2t},...,\ x_{it}\) are given; \(\alpha_{i}\) is the individual-specific fixed effect term, \(€_{it}\) is the residual term and the population regression coefficients, \(\beta_{0},\ \beta_{1},\ \beta_{2},\ ...,\ \beta_{p}\) are to be estimated. The regression coefficients of a negative binomial distribution are interpreted as the difference between the logarithm of two consecutive expected counts, \(x_{\mathit{0}}\) and \(x_{\mathit{0+1}}\), and written as:

\[\beta\ = \ log\ (\ \mu_{x_{\mathit{0 + 1}}})\ –\ log(\ \mu_{x_{\mathit{0}}})\hspace{30mm}(2)\]

where \(\beta\) is the regression coefficient, \(\mu_{x_{\mathit{0 + 1}}}\text{ and }\mu_{\mathit{x_{0}}}\) are the expected count of the predictor variable, evaluated at \(x_{\mathit{0}}\) and \(x_{\mathit{0+1}}\), respectively; \(x_{\mathit{0+1}}\) denotes an incremental change in the predictor variable, \(x\). Alternatively, Equation (2) can be written as:

\[log\ (\ \mu_{x_{\mathit{0 + 1}}})\ –\ log(\ \mu_{x_{\mathit{0}}}) = \ log(\ \mu_{x_{\mathit{0 + 1}}}/\ \ \mu_{\mathit{x_{0}}})\hspace{20mm}(3)\]

This representation of the parameter estimates in Equation (3) gives the *IRR*, which is interpreted as the rate of change in the dependant variable for an incremental change in the predictor variable. The impact of an incremental change in the explanatory variable on the rate of occurrence of the dependant variable can be estimated by calculating the *IRR* for that variable as given in Equation (3). The sign of the coefficient shows whether the dependant variable increases or decreases with incremental changes in the predictor variable and the *IRR* shows the magnitude of the rate of change. We report both the estimated coefficients and the *IRR* values in order to infer the effect of country specific factors on the incidence of daily new COVID-19 cases.

## B. Data

In this study, the Government Stringency Index (*GSI*) and the number of testing done per 1000 people (*TEST*) proxy for government policy; the number of beds per 10,000 people (*BEDS*) and the number of physicians per 10,000 people (*PHY*) are used as proxies for health capacity; population density (*PD*), median age of the population (*POP*) and proportion of people with pre-existing health conditions (*NCDs*) are demographic factors; and GDP per capita (*GDP*) and Human Development Index (*HDI*) represent socioeconomic factors. These are the explanatory variables in this study. The dependent variable is the number of confirmed new COVID-19 cases (*CC*). Daily data for the variables are compiled from the John Hopkins University database and the Oxford University Online Repositories for the period March 15, 2020 to September 30, 2020. The countries analysed are India, the USA, Brazil, Argentina, France, Colombia, Russia, Israel, the UK, and Peru.

# III. Empirical Findings

The results of the coefficient estimate and the *IRR* values for the negative binomial regression, for both time periods, are given in Table 1 below.

Table 1 shows that the *IRR* estimates are statistically significant for all the variables in model 1. The results show that with a unit increase in *GSI, TEST, BEDS and PHY*, keeping other variables in the model constant, the *CC* would be expected to decrease by a factor of 0.673 (*GSI*), 0.184 (*TEST*), 0.782 (*BEDS*) and 0.175 (*PHY*).

Negative coefficient estimates for *GSI, TEST, BEDS* and *PHY* (Table 1) show that an increase in each of these variables leads to fall in the rate of new COVID-19 cases; however, the magnitude of the decrease in the dependant variable, *CC*, for an incremental rise in the predictor variable is given by the *IRR* values. Similarly, for the demographic factors, for an incremental increase in *NCD*, *PD* and *AGE*, the dependant variable *CC* increases by 1.887, 1.145 and 7.896, respectively. The *IRR* estimates show that the median age of the population (*AGE*) is the largest contributor to the incidence of COVID-19. The *IRR* estimates for the socio-economic indicators, *GDP* and *HDI*, are significant at the 10% level, but the values are small. In Model 2, when lockdown restrictions were partially or completely relaxed, *GSI*, *HDI* and *GDP* became insignificant in both Models 1 and 2. This may be due to increased workforce participation in this period, especially those working in the informal sector. All other variables remain statistically significant.

# IV. Conclusion

This study aims to understand the determinants of COVID-19 confirmed cases using a range of government response and socio-economic variables. The study focusses on the 10 most affected countries. The findings of this study emphasise the role of demographic characteristics of the population as well as government stringency and testing policies as important factors in reducing the incidence of COVID-19. Consistent with previous studies, we also find support for dynamic government lockdown policies, with periodic lockdowns effective in controlling new cases. The study suggests that the best course of action for the countries with high COVID-19 cases will be to continue implementing periodic lockdowns, while increasing the number of COVID-19 testing. The study also suggests that these countries should also strengthen their healthcare capacity, proxied in this study by the number of physicians and the number of beds, in order to meet requirements of the vulnerable section of the population.