Skip to contents

The goal of watercostaccra is to provide users with documentation on two surveys on household water costs, coping mechanisms as well as water point estimates conducted in November 2023 in Accra, Ghana. The data sets are associated with the following project report completed by Elizabeth Vicario for the “data science for openwashdata” course offered by openwashdata.org.

Installation

You can install the development version of watercostaccra from GitHub with:

# install.packages("devtools")
devtools::install_github("openwashdata/watercostaccra")

Alternatively, you can download the individual data sets as a CSV or XLSX file from the table below.

dataset CSV XLSX
households Download CSV Download XLSX
waterpoints Download CSV Download XLSX

Data

The package provides access to household water costs, coping mechanisms as well as water point estimates.

The households data set contains data about a household survey on water costs and coping strategies in Accra. It has 116 observations and 89 variables.

variable_name variable_type description
community factor the communities surveyed, options including [1] kg: Korle Gonno and [2] abuja: Abuja
housing_type factor housing type, options includin [1] block_unit: unit in a row of apartments made of cement blocks, [2] wood_unit: unit in a row of apartments made of wood, [3] house, [4] compound_house: single-story L- or C-shaped house with a multiple units around a shared courtyard, [5] multistory_apt: multi-story apartment building, [6] wood_shack: wooden shack, [7] no_structure, and [8] other
respondent_relationship_to_hh factor respondent’s relationship to the household head (respondent identified), options including [1] self, [2] child, [3] spouse, and [4] other_relative
gender factor gender (self-identified) of respondent, options including [1] female and [2] male
tenure factor tenure status, options including [1] rented: renter, [2] owned: homeowner, or [3] no_payment: living without payment)
years_in_community integer number of years respondent has lived in community
adult_count double number of adults in household including respondent. Household is described as those “eating from the same pot”
child_count double number of children under 18 in household. Household is described as those “eating from the same pot”
rooms_in_hh double number of rooms used for sleeping. Household is described as those “eating from the same pot”
business_ownership factor household or respondent owns a business, options including [1] respondent-owned and [2] household-owned
business_location factor location type of the business, options including [1] home_based, [2] outside_home: fixed location outside home, or [3] mobile: mobile location.
business_category factor type of business, options including [1] food, [2] shop, [3] salon, [4] vented_water, [5] tailoring, and [6] other_services.
business_water_use logical respondent’s business uses water beyond typical needs of household (true or false)
business_water_source factor primary source of water for business use (packaged water, piped to home, piped to neighbor’s home, piped to compound, commercial or public tap, borehole, dug well, spring water, delivered water)
primary_dw_source factor primary source of drinking water (packaged water, piped to home, piped to neighbor’s home, piped to compound, commercial or public tap, borehole, dug well, spring water, delivered water)
dw_reason_x logical columns about respondent reasons for using drinking water source on convenience, affordability, availability, temperature, cleanliness, taste, habit or cultural norm, trustworthiness, health, other. (true or false)
package_type_preference factor respondent typically purchases individual, options including [1]individual: sachets/packets/bottles, [2] bag: multipacks of these, or [3] both
package_size_reason_x logical columns about reasons for purchasing preferred package type on storage space in home, cost effectiveness, temperature at time of purchase, availability of money, convenience, size needed for respondent or household, avoiding wasting water by purchasing only when needed. (true or false)
dw_treatment factor treatment methods of water before drinking, options including [1]no_treatment, [2] boil, [3]boil;settle, [4] filter, and [5]settle
primary_water_source factor primary water source for non-drinking water, options including [1]packaged water, [2]piped_to_home, [3]piped to neighbor’s home, [4]piped to compound, [5]commercial or public tap, [6]borehole, [7]dug well, [8]spring water, and [9]delivered water)
primary_source_reason_x logical columns about reasons for using primary source of non-drinking water on proximity to home, convenience, affordability, availability, cleanliness, other. (true or false)
other_non_dw_source_use logical respondent uses at least one source besides primary non-drinking water source (true or false)
other_non_dw_sources_x logical columns about additional water source(s) for non-drinking water on packaged water, piped to home, piped to neighbor’s home, piped to compound, commercial or public tap, borehole, dug well, spring water, delivered water. (true or false)
secondary_source_reason_x logical columns about reason for using secondary source of non-drinking water (primary source is not available, primary source is not clean, primary source is crowded, availability of shower stalls, convenient location)
tap_payment_mode factor respondent’s mechanism for paying for piped water (all respondents use piped water as a primary or secondary source). Options include [1] pay_to_fetch: paying to fetch, [2] shares_bill: sharing or paying the whole bill, or [3] both (at different taps).
daily_hh_water_cost_for_pay_to_fetch double daily estimated cost of drinking water for respondent’s household
daily_hh_water_cost_phhm_for_pay_to_fetch double daily estimated cost of drinking water for respondent’s household per household member
past_struggle_to_find_water logical respondent has struggled to find water before (defined as extreme difficulty in accessing water) (true or false)
time_of_last_struggle_to_find_water factor respondent’s last time of struggle to find water, options including [1] last_3_days, [2] last_7_days, [3] last_30_days, [4] last_year, and [5] over_year_ago.
weekdays_struggle_to_find_water double days in a week the respondent typically struggles to find or pay for water
past_struggle_primary_reason factor primary reason for past struggles to find water, options including [1] availability: availability, [2] cost, and [3] distance: distance to nearest source.
tap_closure_knowledge_x logical columns about respondent’s knowledge about tap closures (usually known, sometimes known, expected due to patterns in closures, not known, or no answer). (true or false)
coping_mechanism_x logical columns about strategies for coping with water shortage (spending more on the same amount of water, purchasing extra water to store at home, using another source, using packaged water for cooking, skipping cooking, using packaged water for bathing, skipping bathing, closing business due to water shortage, skipping laundry). (true or false)
water_storage_drinking_water logical respondent typically stores drinking water at home (true or false)
water_storage_non_drinking_water logical respondent typically stores non-drinking water at home (true or false)
water_storage_none logical respondent typically does not store water at home (true or false)
storage_containers_x logical columns about if respondent typically stores non-drinking water, types of storage containers (plastic jugs also called jerry cans or Kufuor gallons, uncovered or covered barrels, other covered or uncovered containers)
estimated_non_dw_storage_capacity double estimated capacity of storage for non-drinking water (liters)
estimated_stored_non_dw double estimated actual storage of non-drinking water (liters)

The waterpoints data set contains data about a water point survey conducted in Accra as well. It has 49 observations and 30 variables. For an overview of the variable names, see the following table. observations and 89 variables.

variable_name variable_type description
id integer identification number
community factor the communities surveyed, options including [1] kg: Korle Gonno and [2] abuja: Abuja
type factor water point type,options including [1] piped_water, [2] borehole, [3] public_bath, and [4] natural_spring.
available_services factor services available at water point, options including (bathing, public sale of water, toilet, or comination of these)
location factor location of the water point, options including [1] within_a_compound or [2] on_the_street: outside compound adjacent to street.
year_established integer year established
owner factor owner, options including [1] household_head, [2] household_member, [3] community_member: community member outside household, and [4] multiple_community_members: multiple community members outside household
constructor factor type of constructor, options including [1] government or [2] community_member.
managers factor type of typical manager(s) of water point, options including household head or member(s), employee(s), self managed by customers, or combination of these)
estimated_storage_capacity_liters double estimated storage capacity in liters
average_visits_per_customer double average number of daily visits per customer
respondent_would_use_to_prepare_rice logical respondent would use this water to prepare rice, based on its quality (true or false)
perception_of_quality factor respondent’s perception of water quality, options including [1] acceptable, [2] high, and [3] low.
tap_closure_days_per_week double typical number of tap closures per week
price_25_liter_jug double current price of 25-liter jug of water (cedis)
price_20_liter_bucket double current price of 20-liter bucket of water (cedis)
price_30_liter_basin double current price of 30-liter basin of water (cedis)
avg_price_per_liter_cedis double average price per liter, calculated by averaging price per liter of known prices (cedis)
tap_closure_changes factor typical dynamics of water point management during closure (increasing prices, water point likely to close due to low storage, bathing customers have less water than when taps are flowing)
flexible_pricing logical manager adjusts price depending on amount of water needed or familiarity or need of customer (true or false)
price_increase logical price of any volume of water has increased in the last year (true or false)
CBT_sample_source factor source of sample for compartment bag test (CBT) supplied by Aquagenx (https://www.aquagenx.com/cbt-ectc/), options including [1] indirect_from_tap_(traveled_through_hose), [2] other_storage_(traveled_through_hose_or_poured_through_container), [3] storage_tank, and [4] tap.
coli_mpn double results of E. Coli most probable number (MPN) test per 100 mL sample
coli_mpn_ci double results of E. Coli most probable number (MPN) test per 100 mL sample - upper 95% confidence interval (CI)
coli_mpn_health_risk factor results of E. Coli most probable number (MPN) test per 100 mL sample - descriptive health risk, options including [1]possibly_safe, [2]possibly_unsafe, [3]probably_saf, [4]probably_unsafe, [5] safe, [6]unsafe.
tc_mpn double results of Total Coliforms (TC) most probable number (MPN) test per 100 mL sample
tc_mpn_ci double results of Total Coliforms (TC) most probable number (MPN) test per 100 mL sample - upper 95% confidence interval (CI)
tc_mpn_health_risk factor results of Total Coliforms (TC) most probable number (MPN) test per 100 mL sample - descriptive health risk, options including [1]unsafe, [2]possibly_unsafe, and [3]probably_unsafe.

Example

Here is an example illustrating health risks associated with the water samples collected in Accra.

library(watercostaccra)
library(ggplot2)
library(dplyr)
library(tidyr)

long_data <- waterpoints |> 
  pivot_longer(cols = c(coli_mpn_health_risk, tc_mpn_health_risk),
               names_to = "risk_type",
               values_to = "health_risk")

# Count occurrences of each health_risk category within each community and risk_type
count_data <- long_data |> 
  group_by(community, risk_type, health_risk) |> 
  summarise(count = n(), .groups = 'drop')

facet_labels <- c(
  coli_mpn_health_risk = "Coliform MPN health risk",
  tc_mpn_health_risk = "Total Coliform MPN health risk"
)

# Create the bar plot
ggplot(count_data, aes(x = community, y = count, fill = health_risk)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ risk_type, labeller = labeller(risk_type = facet_labels)) +
  labs(title = "Health risk assessment by community",
       x = "community",
       y = "count",
       fill = "health risk") +
  scale_fill_brewer(palette = "Dark2") +
  theme_minimal()

License

Data are available as CC-BY.

Citation

Please cite this package using:

citation("watercostaccra")
#> To cite package 'watercostaccra' in publications use:
#> 
#>   Götschmann M, Vicario E, Davidson B, Amankwaa E, Zhong M (2024).
#>   _watercostaccra: Household water costs and coping strategies data
#>   from metropolitan Accra_. R package version 0.0.0.9000,
#>   <https://github.com/openwashdata/watercostaccra>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {watercostaccra: Household water costs and coping strategies data from metropolitan Accra},
#>     author = {Margaux Götschmann and Elizabeth Vicario and Betty Avanu Davidson and Ebenezer F. Amankwaa and Mian Zhong},
#>     year = {2024},
#>     note = {R package version 0.0.0.9000},
#>     url = {https://github.com/openwashdata/watercostaccra},
#>   }