Business as Unusual
An Explanation of the Increase of Private Economic Activity in High-Conflict Areas in Afghanistan

Data Web Appendix


Tommaso Ciarli, Chiara Kofol, Carlo Menon
This version: Sunday 24
th May, 2015

Paper abstract
We explore the relation between the change in conflict intensity and the investment in private economic activity (PEA) of nearby households in Afghanistan, exploiting a unique dataset containing geographically detailed information on conflict events and on households' activity. We identify the effect of several indicators of conflict on a range of different types of PEA, differentiating across levels of formality, sectors, and capital intensity. The results show that the level of conflict, its impact, and to a lesser extent its frequency, increase the probability that a household engages in self employment activities with lower capital intensity and in activities related to subsistence agriculture. However, the magnitude of most effects are quite small.

Introduction

This appendix describes the data used in the paper Ciarli Tommaso, Kofol Chiara, Menon Carlo, 2015, "Business as unusual. An explanation of the increase of private economic activity in high-conflict areas in Afghanistan", Working Paper, mimeo. The web appendix is divided into two main sections, respectively discussing the harmonisation of household data and the construction of the conflict data.

Household Data: National Risk and Vulnerability Assessment (NRVA)

1.1 Household Surveys and Sampling

    NRVA covers a large number of households (11,760 households in 2003, 30,826 in 2005, 20,668 in 2007/2008 ) over the whole of Afghanistan, from 2003 to 2008 (in three waves). Data were stratified into different agro-ecological zones. Household were randomly selected within wealth groups. Whenever possible seven households were surveyed within each randomly selected village. However, there are large differences among the different waves, both in terms of sampling and questionnaire that needed to be harmonised. Here we describe the sampling strategies for the different surveys.1 In Section 1.2.2 we briefly describe the harmonisation procedure.

NRVA 2003

    This wave of data differs from the 2005 and 2007/2008 ones both in the structure of the survey and in the sampling design. The sample frame, which relied on a village list from the World Food Programme (WFP), is not available. Given the aim of the survey, the data collection is probably biased towards larger rural settlements. The four levels of data collection are: district, community (shura), wealth group, and household. On top of the households interviews data was collected as well at the community and district level. Female interviewers (which collect food consumption data, which is not relevant in our analysis) were not always involved in the south, most eastern districts and urban areas -- only rural areas and Kuchi (nomadic) population were interviewed by females.
    The survey took place during 3 months, covering only one season, all 32 provinces (at the time), all 368 districts (at the time), 1853 villages, 5559 wealth groups, 11757 rural households, and 85577 individuals. Stakeholder participated in questionnaire design only partially. Among others, the Ministry of Labor and Social Affairs, the Ministry of Economics and the Afghan Central Statistical Office (CSO).
    The questionnaire has a unique format, some data were entered manually while other data were transcribed into Teleform format (software that extracts data from paper questionnaires) and scanned. Several trainers were involved in training the enumerators, which may result in variation in the enumerator performance.
    The survey collects information on basic demographics, health, housing, household assets, migration, labour, risk exposure and response, livestock ownership, agricultural activities and household food consumption. For the definition of PEA we use the labour section. The questionnaire is no more available at the CSO, but can be made available by the authors on request.

NRVA 2005

    The sample frame was made available from CSO pre-census household listing. The sampling is proportional to population, except in the smaller provinces and urban centres where over-sampling insured enumeration of sufficient number of households. The sample selection is based on a random selection from geographically ordered Primary Sample Units (PSUs), to give a random spread representing the spatial distribution of the population. The household selection was based on the random start method in the randomly selected villages, where twelve households were interviewed.
    The data was collected at three different levels: district, community, and household. Female interviews took place in all provinces except Zabul. The survey covers both rural and urban areas as well as the Kuchi population.
    As for 2003, the survey took place during 3 months (June-August), covering only one season, during or immediately after the harvest. As this is a time of the year where high consumption patterns are expected, the data mat produce seasonally biased results and poverty estimates that are low compared to the annual average and several other months. We do not use these sections in this paper. The survey covers all 34 provinces, all 392 districts, 2597 clusters, 30822 households, and 221586 individuals. All stakeholders participated fully in the questionnaire design: the Afghan Government (CSO-MRRD), the European Commission (EC/NSS) (main donor), the World Food Programme (WFP), and the United Nations Children's Fund (UNICEF).
    The survey includes more information than in 2003, including sections on remittances, HIV/AIDS, maternal and child health, household non-food consumption and income sources -- questionnaires available here. The questionnaires were Teleform (software that extracts data from paper questionnaires) scannable with data quality routines built-in. The enumerators received uniform training by 2 trainers involved for the whole country. More than 500 field staff were part-time employed. The survey was managed by Government (CSO-MRRD) and funded by EC/NSS (main donor), WFP, UNICEF.

NRVA 2007/8

    This wave is in many ways similar to the NRVA 2005, explicitly with the intention to allow for comparisons. Similarly to 2005, a process of stakeholders consultation provided inputs to further improve the survey and the questionnaire design. In order to facilitate the stakeholder consultation, two workshops were held in January and March 2007. The draft questionnaires were tested twice in the field and a pilot test of the questionnaires took place in five regions for further and final improvements. The questionnaire format was designed in Teleform (software that extracts data from paper questionnaires) to allow for data scanning. Around 1.6 million questionnaire pages were completed and scanned.
    The fieldwork started in mid-August 2007 and lasted up to the end of August 2008. Differently from 2003 and 2005 NRVA the seasonality bias was removed conducting the survey during all 12 months.
    The sample frame came from updated CSO pre-census household listing. The sample is proportional to population, with over-sampling of smaller provinces and urban centres. The sample selection is based on random start method to have a better geographic distribution of the sample. Households within selected villages were randomly selected from CSO household listing. Eight households in each village were surveyed.
    The data was collected at three levels: district, community (shura), and household. Female interviewers participated in all provinces except for Urozgan (good female coverage). The survey covers both rural and urban areas and the Kuchi (nomadic) population.
    The survey covers all 34 provinces, 395 districts, 2572 clusters, 20576 households, and 152262 individuals. The stakeholders fully participated in the questionnaire design: the CSO-MRRD, the EC/NSS, the WFP, the Department for International Development (DFID), the Asian Development Bank (ADB), UNICEF, an the World Bank (WB).
    With respect to 20003 and 2005 a few more section were added: disabilities, labour market participation (available in 2003), infant- and under-five years old mortality, and women's position. The 2007/8 survey includes also more details on household food and non-food consumption and dropped the section on HIV/AIDS -- questionnaires available here. The enumerators received uniform training in one training session for all field staff for the whole country and was more detailed and longer (17 days) training than in previous years. 156 field staff were selected. The method of staff selection was more transparent (short listing, test and interview from 12 thousand applicants). The survey was managed by the Afghan Government (CSO-MRRD) and funded by EC/NSS (main donor), WFP, the Department for International Development (DFID), the Asian Development Bank (ADB), UNICEF, an the World Bank (WB).

1.2 Households Data Harmonisation and Construction

    As described in Section 1.1 the methodologies used for data collection, as well as the questionnaires were different across waves. The harmonisation is easier for 2005 and 2007/8, which use the same sampling, and which use a more similar questionnaire. The comparison with 2003 is more complicated, particularly because of the different sampling, and is used in this paper as a robustness check providing a longer time variation. 
    Below we briefly describe the harmonisation procedure that we have followed, starting with the main control variables (1.2.1), to then focus at more length on the the dependent variable computing the different sources of PEA (1.2.2). In the last two parts we summarise how we harmonised the differences that were introduced in the Afghan sub-national administrative boundaries (Provinces and Districts) (1.2.3) and how we geolocalised the villages (and their households) (1.2.4).

1.2.1 Control Variables

    For 2003 we where able to match only a subset of the control variables which were comparable with 2005 and 2007/08. All control variables are summarised in Table 1, where we indicate when the variable was computed also for 2003.
 



Control variables
Description 2003



HHMemb2 = 1 if HH members are < 2 Yes
HHMemb5 = 1 if HH members are < 5 & > 2 Yes
HHMemb10 = 1 if HH members are < 10 & > 5 Yes
HHMemb15 = 1 if HH members are < 15 & > 10 Yes
HHMemb20 = 1 if HH members are < 20 & > 15 Yes
MaleH = 1 if the household head is a male Yes
AgeHH Age of the HH head Yes
GenderAvHH Average gender of the HH Yes
LiteracyH = 1 if the HH head is literate Yes
LiteracyAvHH Average literacy of the HH members Yes
hhassets Number of assets in the HH Yes
Rural = 1 if the HH lives in a rural area No
Credit_Inst = 1 if the HH obtained credit the previous year: credit institution No
Credit_Lender = 1 if the HH obtained credit the previous year: private lender No
Credit_Inform = 1 if the HH obtained credit the previous year: informal source No
Credit_Other = 1 if the HH obtained credit the previous year: other sources No
Credit_None = 1 if the HH did not obtain credit the previous year No
Loan = 1 if the HH obtained credit the previous year No
HHMigration = 1 if any HH member migrated the previous year Yes
shocks = 1 if the HH experienced a shock in the previous year Yes
Dremittances = 1 if the HH received remittance the previous year No
DSocialContr = 1 if the HH received any social aid the previous year No
RoadKm Km from the closest road No
DElectrNo = 1 if the HH has no access to electricity No
DMkt_Close = 1 if the HH is close to the market No
Aunemp_ratio % of unemployed adults (older than 13) in the cell Yes
perc_opium_act % of households cultivating opium in the cell No



Table 1: Control variables harmonised across waves. All the variables were available for 2005-2007/8

    The relevant choices we made in order to make the most problematic variables comparable across waves are summarised below.
   First, hhassets (2003-2007/8), includes the number of households assets which were available in all the three rounds of the survey: radio, bicycle, TV, motorcycle and car.  
    Second, shocks (2003-2007/8), counts only the number of shocks experienced by the households which were asked in all the three rounds of the survey: unusually high level of crops, unusually high level of livestock, earthquakes, landslides, flooding, late damaging frost, hailstorms, unusually high increases in food prices, and unusual decrease in farm gate prices.
    Third, DMkt_Close (2005-2007/8), computes the presence of a close market differently for 2005 and for 2007/8. For 2005 the dummy is equal to one if either in winter or in the summer the market is in the same village were the interviewed households live, or it takes less than one hour to reach it by foot, public transport, or private vehicle (male shura questionnaire, Section 3). For 2007/8 the dummy variable is equal to one if the market is in the same village of the interviewed households (male community questionnaire Section 3).
    Fourth, Aunemp_ratio (2005-2007/8), is the average unemployment ratio in the area (district or cell).Because the section on individual employment is missing in the 2005 questionnaire, for this year we imputed the average area employment ratio between 2003 and 2007/8.

1.2.2 Private Economic Activity

    The definition on the household's PEA is built using the the main source of the household income, for 2005-2007/8. Both waves use the same options, which makes the comparison seamless, allowing to define a number of different types of PEA. However, in order to harmonise the information on PEA also with the 2003 wave we faced two main challenges.
    First, the 2003 questionnaire is less detailed and the information in the labour section (Section F) allows to identify only four PEA comparable with the information from the 2005-2007/8 surveys: business and self employment, which can be further divided into agricultural and non-agricultural.
    Second, the 2003 survey does not contain information on the sources of household income, but collects information on the household members' employment (including self employment). On the other hand, in 2005 there is no section collecting labour information. Fortunately, in the 2007/8 survey both sections were available: income sources, as in 2005 (Section 8), and on employment, as in 2003 (Section 9). This allowed us to compare two different measures of the main household occupational choice: one reflecting the main source of household income (comparable with 2005), and one reflecting the activity where each household member was employed (comparable with 2003). Correlating measures of self employment using the two sources allowed to use the definition of self employment in 2003 (defined through the employment sections) that is closer to the corresponding definition used for 2005-2007/8 (defined through the income section).
    Table 2 compares the correlation between the level of self employment in non agricultural activity, measured as the main source of income (se_na -- see definition below), and different measures of the level of self employment, measured as the relative number of individuals working as self employed in non agricultural activities. Bot levels are computed for the 2007-8 NRVA survey, using the income and labour sections, respectively. These are:


Table 2: Correlations between self employment in non agriculture from different sections of the questionnaire. We use the income information and different measures of self employment from the labour section of the questionnaire. Source: own elaboration on NRVA survey  2007/8

Variables
(1)
(2)
(3)
(4)
Labour section variables
hh_se_nagric_ p (1) 1.000



abs_majority_p (2) 0.562
1.000


rel_majority_p (3) 0.659
0.600
1.000


Income section variable
se_na (4) 0.515
0.344
0.377
1.000

    Table 2 shows that we obtain the highest correlations when in the labour section we define self employment considering that at least one member of the household is self employed in a non agricultural activity. Therefore, for 2003 one member of the household is enough to define the households as self employed. In this way we should capture households occupational choice in a way that is close to the 2005 and 2007/8 definition where we have the information on the income sources.
    At the end of the harmonisation process of the PEA variables across the different waves, we obtain two different groups of variables. First, the longest panel (2003-2005-2007/8) includes the following definitions of PEA for all years.

Income source
Self employment types


(1) (2) (3) (4) (5) (6)


se_na Low_K High_K agric agr_sub agr_sale







Crop production for home consumption


Yes Yes
Livestock production for home consumption


Yes Yes
Production & sale of field crops


Yes
Yes
Prod & sales of cash crops (except Opium)


Yes
Yes
Prod & sales of orchard products


Yes
Yes
Prod & sales of livestock & products


Yes
Yes
Sales of prepared foods Yes Yes



Miller Yes
Yes


Petty trade/ shopkeeping Yes Yes



Cross border trade Yes
Yes


Firewood /charcoal sales Yes Yes



Handicrafts (sewing, embroidery, etc) Yes Yes



Carpet weaving Yes Yes



Taxi/transport Yes
Yes



Table 3: List of the sources of income considered as self-employment for 2005 and 2007/8. Source: own elaboration on NRVA data

    In Figure 1 we plot the distribution of the four PEA types across districts in 2003, 2005 and 2007/8. The similarity of the distributions across years suggests that the harmonisation of the PEA variable captures similar household activities, allowing for a comparison across all years.


PIC PIC
(a) Small business (bus)
(b) Self employment in agriculture (agric)


PIC
                  (c) Non agricultural self employment (se_na)

Figure 1:
Comparison of the distribution of PEA density by district across the three NRVA waves. Kernel density estimations for the PEA variables that can be computed for all years, 2003, 2005, and 2007/8. Source: own calculations based on NRVA data

    The figure also shows that in 2005 there is the highest density of household holding a self employment activity. This is due to a significant difference in the number of self employment activities in agriculture, whereas the number of self employed in non-agricultural activities is lower in 2005.
Second, for the shorter 2005-07/8 database, we can include also the following more detailed definitions of PEA:
Finally, for the 2005-07/8 database we also identified residual occupational variables, used to investigate if a lower intensity of PEA translated in other occupational choices. The NRVA questionnaires allowed to identify the following three residual categories:

Income source
Paid work, opium & other income sources


(1) (2) (3)


Wage_Inc Opium_Inc Other_Inc




Agricultural wage labour (Non Opium) Yes

Other wage labour Yes

Skilled labour Yes

Salary/Government job/Teacher/NGO/UN Yes

Military service Yes

Production & sale of opium
Yes
Opium wage labour
Yes
Shepherding

Yes
Mining

Yes
Remittances from seasonal migrants

Yes
Remittances from family members living permanently away from home

Yes
Pension

Yes
Other Government benefits

Yes
Rental income

Yes
Sale of food aid

Yes
Begging

Yes
Borrowing

Yes
Other

Yes





Table 4: List of the sources of income not considered as PEA for 2005 and 2007/8. Source: own elaboration based on NRVA questionnaire

1.2.3 Afghan Districts

    The administrative boundaries of Afghan districts and provinces were subject to changes in 2005. We harmonized the district boundaries of 2003 (392 districts) using the 2005 new administrative division (398 districts). In our analysis we use a partition of Afghanistan in 398 districts for all waves.
    In particular, we re-assigned 2003 households to the newer 2005 districts using the village latitude and longitude and a shape-file provided by the Afghanistan Information and Management Services (AIMS).  The allocation of households was implemented using ARCGIS.
    For 2005-2007/8 we kept the same districts assigned by the CSO but we matched their codes with the ones assigned through ARCGIS to 2003 (district_ gis) using the district names, in order to obtain homogeneous codes for all the three NRVA waves.

1.2.4 Geo-References for 2007/2008 Villages

    The analysis in this paper exploits the geographical and time variation of households and of the conflict. Until 2005 NRVA supplied data with the geo-location of the villages, which makes the geographical analysis very attractive with these data. The CSO policy changed since 2007/8 when the villages geo-references were not included. We then assigned a geocode to the village (and each of the households sampled from the village). No codebook is available online but a nearly complete geocode codebook can be purchased from AIMS.
     We assigned the coordinates to 2007-8 villages with the following procedure. First, we matched the village's geocodes in our sample with the AIMS geo-referenced geocodes. Next, we matched the unmatched villages using the geocodes in the 2005 wave for those village that were sampled in both waves and which were located in the same district and province. Finally, we matched the remaining few villages with the geographic gazetteer provided by Humanitarian Response -- available online here -- and that provided by AIMS (matching the village, district, and province names). After this procedure Only 48 households surveyed in 2007/8 were left without geographical coordinates.

Conflict Data: Afghan War Diaries

2.1 Afghan war diaries

    The Afghan War Diaries (AWD) is a large dataset of conflict reports recorded during the Afghan and the Iraq wars between 2004 and 2009 by US troops. All reports contain a large amount of details on each registered event, among which the geographical coordinates, the number of people (soldiers and civilians) killed and wounded, and a description of the action in which the military were involved. The data was collected by soldiers and intelligence officers, and includes intelligence information, reports of meetings with political partners, and related details. Most of the reports were not cleared, which is likely to reduce the likelihood of misreported events. The reports where assigned to one among dozens of different categories that differentiate the types of action, going from the Afghan Police training through indirect fire and police actions, up to vehicle interdiction (please refer to the war diaries website for details).
   Immediately after their release, the reports were machine coded into a large database detailing a large number of variables, including geographic coordinates, number of people involved and killed or wounded, types of action, perpetrators, etc. (see for example the
Guardian). A number of studies have verified the reliability and the accuracy of these conflict data (see for example the discussion in Zammit-Mangion et al. (2012)).

    In order to use these data for our analysis, first, we define the 'relevant' conflict events as those events that may cause disruption of economic activity, or fear, or any other condition that we think that could affect households' behaviour.2 We do so using the conflict category (see also https://www.wikileaks.org/afg/), and excluding categories such as unexploded bombs or medical interventions. We list all the categories forming the set of 'relevant' conflict events in Table 5. We use the events in the excluded categories to define a different variable (no_conflict) identifying the presence of US military, to control whether military activity, and related aid, with little impact on the perception of violent conflict, impacts private economic activity. This way we also make sure that we are fully exploiting the richness of the data.
    Second, we assigned each conflict event to an area (district and cell) to construct the different aggregate measure of conflict per area.



Activities Definition
AIR ASSAULT conflict air operations
AMBUSH ambushes that most of the times end up with wounded/killed and with explosions
AMF-ON-ANA events where a fire, even if friendly, occurred
ANA-ON-ANP events where a fire, even if friendly, occurred
ARSON actions where buildings/infrastructures were set on fire
ASSASSINATION events where people were killed
ATTACK events where someone was attacked. Not necessarily involves wounded/killed
BLUE-GREEN events where there is a fire
BLUE-BLUE events where there is a fire
BLUE/WHITE events where there is a fire
BREACHING events with fire and possibly casualties
CARJACKING mainly enemies hijacking cars or other private vehicles
CCA diverse suspicious events
CAS events where helicopters are involved in the attack
CLOSE AIR SUPPORT events where helicopters are involved in the attack
COUNTER INSURGENCY violent actions
COUNTER MORTAR FIRE events where there is a fire
CRIMINAL ACTIVITY it can include explosions, theft, wounded journalists during attacks
DELIBERATE ATTACK it includes diverse violent actions, sometimes with wounded/killed individuals
DIRECT FIRE events where there is a fire
DOWNED AIRCRAFT it describes operations where aircrafts were downed
DRUG OPERATION it can include fires and violent actions
ENEMY ACTION it describes violent events with fire
ESCALATION OF FORCE It describes violent actions with possibly wounded/killed
GREEN-BLUE it describes events where there is a fire
GREEN-GREEN it describes events where there is a fire
GREEN-WHITE it describes events where there is a fire
DF COUNTER FIRE it describes events where there is a fire
KIDNAPPING it describes operations where someone was kidnapped
LOOTING it describes operations where a loot took place
MINE STRIKE events where there is an explosion
MURDER it describes operations where someone was murdered
IED AMBUSH attack on US army using Improvised Explosion Device
IDF INTERDICTION prediction of a future fire/bombing while not happened yet
IED FOUND/CLEARED IED detonated by the US military
IED EXPLOSION mainly bombs, or suicide bombs against military and civilians
INTERDICTION suicide bombers are spot and blocked, arrested, or killed, sometimes the IED explodes
SNIPER OPERATIONS fire starts from an hidden place
TRIBAL fire events. Violent tribal disputes
TRIBAL FEUD violent tribal disputes
UAV (Unmanned Aerial Vehicle): Mixed events that can include fire, wounded/killed
POLICE ACTIONS they can be either violent or not. They can include fire
MEDEVAC(LOCAL NATIONAL medical interventions
MINE FOUND/CLEARED non-violent event
MOVEMENT TO CONTACT movement in order to contact the enemy. It can be violent but not always
MUGGING it describes operations where someone was mugged
NARCOTICS disruption of a major drug labs
NBC event that describes a show of force
NONE SELECTED diverse events some of them violent
OTHER conflict related event, with fire, or explosion
OTHER (HOSTILE ACTION) events such as kidnapping/killing/robbery
OTHER DEFENSIVE it can include fire/violent events
OTHER OFFENSIVE it can include fire/violent events
POLICE ACTIONS they can be either violent or not. They can include fire
POLICE INTERNAL violent events with fire, wounded/killed
PLANNED EVENT mixed evidence but mostly violent events
PREMATURE DETONATION explosive events
RAID violent events with possibly wounded/killed individuals
RPG rocket-propelled grenade actions
SAFIRE surface to air fire
SEARCH AND ATTACK violent actions with possibly wounded/killed individuals
SECTARIAN VIOLENCE violent events such as suicide bombers
SHOW OF FORCE it reports either battle events or events where there is a fire
SMALL UNIT ACTIONS violent actions possibly with direct fire, possibly with wounded/killed individuals
SNIPER OPS fire started from an hidden place
UNKNOWN EXPLOSION explosive event
VANDALISM diverse disruptive events
VOGE visual observation of ground explosion

Table 5: List of categories included among the relevant conflict events. Source: war diaries website


2.2 Global Dataset on Events, Location and Tone (GDELT)

    GDELT is database that archives and assigns geographical coordinates to all event reported in the news from 1979 to date (see the web page for details: http://gdeltproject.org/data.html). We used a database containing conflict events from 1979 to 2012, similar to the one used in Yonamine (2013). Events related to conflict are classified as material cooperation, verbal cooperation, verbal conflict and material conflict.
    In order to use these data for our analysis we first delete conflict events duplicates. Next, we assign each conflict event to an area (cell and district). Finally, we define one measure of conflict per area, which sums all the events recorded in one year in that area. To reduce the noise of events that may not be related to the conflict, and that may have negligible effects on household behaviour, we included in the count only the events that were classified as 'material conflicts'.

2.3 Geographical Distribution of Conflict Events and Impact

    In order to control for different sources of information we construct a number of measures of the intensity of conflict, covering two or more of the HH survey years. Below we compare the geographical distribution through time of the conflict using the different measures (all normalised by population). Figure 2: number of conflict events recorded by the US army (n_conflict); Figure 3: number of material conflict events recorded by the media (n_event4); Figure 4: number of individuals killed or wounded (n_ wk); Figure 5: number of U.S soldiers killed or wounded ( n_wk_usa); Figure 6: number of Afghan civilians wounded or killed (n_ wk_civ); Figure 7: number of insurgents wounded or killed (n_ wk_ins); Figure 8: percentage of days in a year in which no relevant conflict occurs in the district (peace_ days); and Figure 9: percentage of households that experience a shock related to violence and insecurity ( p_shockins).



Number of relevant conflicts recorded by the US army per district, 2005-2008 Number of conflicts recorded by the media per district, 2003-2008
Figure 2: Number of relevant conflicts recorded by the US army per district, 2005-2008. Conflicts are normalised by the district population. Source: own calculations based on AWD
Figure 3: Number of conflicts recorded by the media per district, 2003-2008. Conflicts are normalised by the district population. Source: own calculations based on GDELT


Number of total individuals wounded and killed recorded by the US army per district, 2005-2008 Number of U.S. soldiers wounded and killed recorded by the US army per district, 2005-2008
Figure 4: Number of total individuals wounded and killed recorded by the US army per district, 2005-2008. Wounded and killed are normalised by the district population. Source: own calculations based on AWD Figure 5: Number of U.S. soldiers wounded and killed recorded by the US army per district, 2005-2008. Wounded and killed are normalised by the district population. Source: own calculations based on AWD


Number of civilians wounded and killed recorded by the US army per district, 2005-2008 Number of insurgents wounded and killed recorded by the US army per district, 2005-2008
Figure 6: Number of civilians wounded and killed recorded by the US army per district, 2005-2008. Wounded and killed are normalised by the district population. Source: own calculations based on AWD Figure 7: Number of insurgents wounded and killed recorded by the US army per district, 2005-2008. Wounded and killed are normalised by the district population. Source: own calculations based on AWD


Percentage of days in a year in which there is no relevant conflict Percentage of households in a district that have experienced a shock related to violence and insecurity
Figure 8: Percentage of days in a year in which there is no relevant conflict. Density. Source: own calculations based on AWD
Figure 9: Percentage of households in a district that have experienced a shock related to violence and insecurity in t -1. Density. Source: own calculations based on AWD

2.4 Geographical Distribution: Comparing Sources and Data

    Next, we compare the geographical distribution of conflict events estimated with the variables used for this paper with the geographical distribution of conflict in Afghanistan estimated using different data sources, methods, or variables.
    The Guardian newspaper was the first one to publish figures using the Afghan Warlogs (accessible here). For instance, Figure 10 maps the distribution of IED attacks in Afghanistan from 2004 through 2009.
    The geographical distribution is very similar to the one plotted int the figures above using the same source of data and normalised by population size. Moreover, the Guardian data also show a sharp increase in the intensity of conflict between 2005 and 2008, and how it spreads to initially unaffected areas in the North of the country.


IED attacks per year in Afghanistan
Figure 10: IED attacks per year in Afghanistan. From the Afghan Warlogs data published by The Guardian newspaper. Source: Guardian website


    Also using the Afghan war diaries Zammit-Mangion et al. (2012) suggest a very similar geographical distribution of the number of logs through years (see the paper supporting material), covering all events, not only the ones that may be considered relevant for economic decisions. Perhaps more interesting to show here is the escalation of conflict between 2004 and 2009 across Afghanistan (see the original paper for details on how this is modelled).
    In Figure 11 we report Figure 2 from Zammit-Mangion et al. ( 2012) where they plot the weekly growth in the number of events registered in the Afghan war diaries. The figure confirms what the other figures have shown: the conflict increases mainly in the Helmand province and in the North, where in 2004 there were no activities registered. The less significant increase in the South, were the conflict is more pronounced is due to the fact that the number of activities were relatively already very high in 2004.
Escalation of Afghan conflict
Figure 11: Growth of the conflict activities registered in the Afghan war diaries between 2004 and 2009. Only regions with positive overall growth. For more details about the figure see the source: Zammit-Mangion et al. (2012 )


Very similar results on the number of deaths and wounding by year are reported by the Visualizing data website, as shown in Figure 12.

Tableau data
Figure 12: Number of deaths and wounding using the Afghan war diaries (2005-2008). Source: Visualising data

    Finally, O'Loughlin et al. (2010) compare the conflict figures from the Afghan war diaries with those from the Armed Conflict Location & Event Data Project (ACLED). In Figure 13 we report the authors figure number 5 where they plot the geographical distribution of the share of violent event data per province with respect to the total number of events.
    Although data availability in ACLED limits the comparison to the years 2008 and 2009, the figure shows a strong similarity in the geographical distribution of conflict captured by different data sources.

Warlogs ACLED compared
Figure 13: Share of conflict per province with respect to the total number of conflict in the country (2008-2009). The authors use the Afghanistan war diaries and ACLED. The restrictions on the period compared is due to the data availability in ACLED (2008-09). Source: O'Loughlin et al. (2010)

2.5 Time Distribution: Comparing Data Sources

We compare the distribution of conflict across districts, for different years, emerging from the different data sources employed in this paper to compute different indicators of conflict intensity: Afghan War Diaries (a), GDELT (b), and the experience of violent shocks from NRVA (c). Figure 14 shows that the different measures and data sources suggest a similar intensification of the conflict from 2003 (when available) to 2008. All measures also suggest the same skewness in the distribution, with many districts experiencing relatively low conflict, and a small number of districts experiencing intense conflicts

Kernel WIki
Kernel ACLED
(a) Afghan war diaries (AWD)
(b) GDELT


Kernel NRVA
(c) NRVA Shock Insecurity


Figure 14: Kernel density of conflict intensity (normalized by population per district) for different years. Source: own computation based on AWD, GDELT and NRVA (different years)


Geographical Distribution of Conflict and Private Economic Activities

In the paper we exploit the time and space variation of different types of entrepreneurial activities and different indicators of conflict. As an example, in Figures 15 and 16 we contrast the geographical distribution across districts for a number of agricultural activities and the number of 'relevant' conflicts recorded in the AWD (normalised by the district population), for 2005 and 2008.

Non Agricultural self employment
Agriculture for sale
(a) Non agricultural self employment
(b) Agriculture for sale


Subsistence agriculture
Conflict intensity per capita
(c) Subsistence agriculture
(d) Conflict intensity per capita


Figure 15: Percentage of household activity per district and conflict intensity: 2005. % of self employed in non agricultural activities per district (a); % of self employed in agriculture for sale per district (b); % of self employed in subsistence agriculture per district (c); number of conflict per district normalised by population (d). The intensity of the colour indicates the percentage of households in the district owning a particular activity (a-c) and the normalised number of conflict events (d) Source: own elaboration on NRVA and AWD data.


Non Agricultural self employment
Agriculture for sale
(a) Non agricultural self employment
(b) Agriculture for sale


Subsistence agriculture
Conflict intensity per capita
(c) Subsistence agriculture
(d) Conflict intensity per capita


Figure 15: Percentage of household activity per district and conflict intensity: 2008. % of self employed in non agricultural activities per district (a); % of self employed in agriculture for sale per district (b); % of self employed in subsistence agriculture per district (c); number of conflict per district normalised by population (d). The intensity of the colour indicates the percentage of households in the district owning a particular activity (a-c) and the normalised number of conflict events (d) Source: own elaboration on NRVA and AWD data.



Footnotes

1 This information is available in "Summary of the National Risk and Vulnerability Assessment 2007/2008" paragraph 2, Jehoon Printing Press
2 See O'Loughlin eta al. (2010) for a similar classification. Our results using their classification of violent events, instead of our classification of 'relevant' events, do not change.

References

O'Loughlin, John, Frank D W Witmer, Andrew M Linke, and Nancy Thorwardson, "Peering into the Fog of War: The Geography of the WikiLeaks Afghanistan War Logs, 2004-2009," Eurasian Geography and Economics, 2010, 51 (4), 472-495.

Yonamine, James E, "Predicting Future Levels Of Violence In Afghanistan Districts Using GDELT," Working Paper, Penn State University, 2013.

Zammit-Mangion, Andrew, Michael Dewar, Visakan Kadirkamanathan, and Guido Sanguinetti, "Point process modelling of the Afghan War Diary," Proceedings of the National Academy of Sciences, 2012, 109 (31), 12414-12419.