Skip to contents

This vignette demonstrates the washr data package development workflow with a toy dataset titled fssample. The fssample data is an (imaginary) 5-day sample collection of faecal sludge.

You are taking 20 samples of 1 liter faecal sludge from pit latrines and septic tanks at households and public toilets (5 samples each). For each sample, you note the number of daily users of the sanitation system.1

1. Create and process dataset

After initializing an R package named with fssample with devtools, you can start to set up the raw data in the package by executing:

Under the fssample directory, there is now a new directory data-raw with an R script data-processing.R inside it. Go to data-processing.R and refer to our template code to import, clean, and export the dataset. For instance, you may want to change the data type of the column “location” to be factor or reformat the column “date” into YYYY-MM-DD.

After executing the last a few lines in data-processing.R, namely,

usethis::use_data(fssample, overwrite = TRUE)
fs::dir_create(here::here("inst", "extdata"))
readr::write_csv(fssample,
                 here::here("inst", "extdata", paste0("fssample", ".csv")))
openxlsx::write.xlsx(fssample,
                     here::here("inst", "extdata", paste0("fssample", ".xlsx")))

A directory data/ that contains the exported data in .rds format was created in the root directory.

2. Document dataset

The next step is to provide human and machine-readable documentation for the dataset and the package itself.

For documenting the package, you work with the DESCRIPTION file by running:

Next comes the dataset documentation. You first create a dictionary for the dataset in CSV file format.

Go to data-raw/dictionary.csv, open the CSV file, and fill in the empty column description of the dictionary. Once the dictionary is complete, you document the dataset by turning the content of the CSV file into roxygen comments by executing:

Go to R/ and fill in the title and description for the dataset.

Communicate dataset

The R dataset and documentation are complete. It’s time to communicate with the public using human-readable and visually appealing tools. We currently achieve this with the following two components.

  • README
  • pkgdown website
setup_readme()
# Go to README.Rmd and complete this R Markdown file
build_readme() # Generate README.md from README.Rmd
setup_website()

Now, it’s time to work on polishing the README and website. Once you are satisfied with them, don’t forget to re-run build_readme() and build_site() again to update.