Type: Package
Title: Demographic Analysis and Data Manipulation
Version: 0.4.2
Description: Perform tasks commonly encountered when preparing and analysing demographic data. Some functions are intended for end users, and others for developers. Includes functions for working with life tables.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Depends: R (≥ 4.3.0)
LinkingTo: cpp11
Imports: cli, rlang, rvec, tibble, tidyselect, utils, vctrs
Suggests: bookdown, covr, dplyr, ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://bayesiandemography.github.io/poputils/, https://github.com/bayesiandemography/poputils
BugReports: https://github.com/bayesiandemography/poputils/issues
NeedsCompilation: yes
Packaged: 2025-07-12 04:09:16 UTC; johnbryant
Author: John Bryant [aut, cre], Bayesian Demography Limited [cph]
Maintainer: John Bryant <john@bayesiandemography.com>
Repository: CRAN
Date/Publication: 2025-07-12 04:40:02 UTC

Functions for working with demographic data

Description

Functions for common tasks in demographic analyses. Some functions are aimed at end-users, and others at developers.

For end users

Data manipulation

Labels

Life expectancy, life tables

** Fertility**

For developers

Checking arguments

Data manipulation

Labels

Author(s)

Maintainer: John Bryant john@bayesiandemography.com

Other contributors:

See Also

Useful links:


Infer Age Label Type

Description

Determine whether a set of age labels refer to one-year, five-year, or life-table age groups.

Usage

age_group_type(x)

Arguments

x

A vector of age labels

Details

The valid types of age labels are:

If x does not fit any of these descriptions, ⁠then age_group_type()⁠ throws an error.

If x could belong to more than one type, then age_group_type() prefers "single" to "five" and "lt", and prefers "five" to "lt".

Value

"single", "five", or "lt".

Examples

age_group_type(c("5-9", "0-4", "100+"))
age_group_type(c("2", "5", "1"))
age_group_type(c("0", "1-4"))

## could be any "single" or "lt"
age_group_type("0")

## could be "five" or "lt"
age_group_type("80-84")

Create Age Labels

Description

Create labels for age groups. The labels depend on the type argument:

Usage

age_labels(type, min = 0, max = 100, open = NULL)

Arguments

type

Type of age group labels: "single", "five", or "lt".

min

Minimum age. Defaults to 0.

max

Maximum age for closed age groups. Defaults to 100.

open

Whether the last age group is "open", ie has no upper limit.

Details

The first age group starts at the age specified by min. If open is TRUE, then the final age group starts at the age specified by max. Otherwise, the final age group ends at the age specified by max.

open defaults to TRUE when min equals zero, and to FALSE otherwise.

Value

A character vector.

See Also

reformat_age()

Examples

age_labels(type = "single", min = 15, max = 40)
age_labels(type = "five")
age_labels(type = "lt", max = 80)

Lower Limits, Midpoints, and Upper Limits of Age Groups

Description

Given a vector x of age group labels, return a numeric vector.

Vector x must describe 1-year, 5-year or life-table age groups: see age_labels() for examples. x can format these age groups in any way understood by reformat_age().

Usage

age_lower(x)

age_mid(x)

age_upper(x)

Arguments

x

A vector of age group labels.

Details

These functions can make age groups easier to work with. Lower and upper limits can be used for selecting on age. Replacing age group with midpoints can improve graphs.

Value

A numeric vector, the same length as x.

See Also

reformat_age() age_labels()

Examples

x <- c("15-19", "5-9", "50+")
age_lower(x)
age_mid(x)
age_upper(x)

## non-standard formats are OK
age_lower(c("infants", "100 and over"))

df <- data.frame(age = c("1-4", "10-14", "5-9", "0"),
                 rate = c(0.023, 0.015, 0.007, 0.068))
df
subset(df, age_lower(age) >= 5)

Validity Checks for Age Labels

Description

Check that age labels can be parsed and, optionally, whether the labels are complete, unique, start at zero, and end with an open age group.

Usage

check_age(
  x,
  complete = FALSE,
  unique = FALSE,
  zero = FALSE,
  open = FALSE,
  closed = FALSE
)

Arguments

x

A vector of age labels.

complete

If TRUE, test whether x has gaps.

unique

If TRUE, test whether x has duplicates.

zero

If TRUE, test whether youngest age group in x starts at 0.

open

If TRUE, test whether oldest age group in x is open.

closed

If TRUE, test whether oldest age group in x is closed.

Details

By default, check_age() only tests whether a set of labels can be parsed as single-year, five-year, or life table age groups. (See age_group_type() for more on the three types of age group.) However, it can also apply the following tests:

Value

TRUE, invisibly, or raises an error if a test fails.

See Also

Examples

try(
  check_age(c("10-14", "0-4", "15+"),
            complete = TRUE)  
)

try(
  check_age(c("10-14", "5-9", "0-4", "5-9", "15+"),
            unique = TRUE)
)

try(
  check_age(c("10-14", "5-9", "15+"),
            zero = TRUE)
)

try(
  check_age(c("10-14", "0-4", "5-9"),
            open = TRUE)
)

try(
  check_age(c("10+", "0-4", "5-9"),
            closed = TRUE)
)

Check that Arguments have Same Length

Description

Check that x and y have the same length.

Usage

check_equal_length(x, y, nm_x, nm_y)

Arguments

x, y

Arguments to compare

nm_x, nm_y

Names to use in error message

Value

'TRUE', invisibly.

Examples

x <- 1:3
y <- 3:1
check_equal_length(x = x,
                   y = y,
                   nm_x = "x",
                   nm_y = "y")

Check Whole Number

Description

Check that n is finite, non-NA scalar that is an integer or integerish (ie is equal to round(n)), and optionally within a specified range and divisible by a specified number.

Usage

check_n(n, nm_n, min, max, divisible_by)

Arguments

n

A whole number

nm_n

Name for 'n' to be used in error messages

min

Minimum value 'n' can take. Can be NULL.

max

Maximum values 'n' can take. Can be NULL.

divisible_by

'n' must be divisible by this. Can be NULL.

Value

If all tests pass, check_n() returns TRUE invisibly. Otherwise it throws an error.

Examples

check_n(10, nm_n = "count", min = 0, max = NULL, divisible_by = 1)
check_n(10, nm_n = "count", min = NULL, max = NULL, divisible_by = NULL)
check_n(10, nm_n = "n", min = 5, max = 10, divisible_by = 2)

Check that Colnum Vectors do not Overlap

Description

Given a named list of colnum vectors, like those produced by tidyselect::eval_select(), throw an error if there is an overlap.

Usage

check_no_overlap_colnums(x)

Arguments

x

A named list of integer vectors.

Value

TRUE, invisibly

See Also

tidyselect::eval_select()

Examples

x <- list(arg1 = c(age = 1L),
          arg2 = c(gender = 4L, region = 5L))
check_no_overlap_colnums(x)

Aggregate Age Group Labels

Description

Convert age group labels to a less detailed classification. The three classifications recognized by combine_age() are "single", "five", and "lt", as defined on age_labels(). The following conversions are permitted:

Usage

combine_age(x, to = c("five", "lt"))

Arguments

x

A vector of age labels

to

Type of age classification to convert to: "five" or "lt". Defaults to "five".

Value

If x is a factor, then combine_age() returns a factor; otherwise it returns a character vector.

See Also

Examples

x <- c("0", "5", "3", "12")
combine_age(x)
combine_age(x, to = "lt")

Derive Life Tables that Match Life Expectancies, using a Brass Logit Model

Description

Turn life expectancies at birth into full life tables, using the Brass logit model. The method is simple and is designed for simulations or for settings with little or no data on age-specific mortality rates. In settings where data on age-specific mortality is available, other methods might be more appropriate.

Usage

ex_to_lifetab_brass(
  target,
  standard,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  radix = 1e+05,
  suffix = NULL
)

Arguments

target

A data frame containing a variable called "ex", and possibly others. See Details.

standard

A data frame containing variables called age and lx, and possibly others. See details.

infant, child, closed, open

Methods used to calculate life expectancy. See lifetab() for details.

radix

Initial population for the lx column in the derived life table(s). Default is 100000.

suffix

Optional suffix added to life table columns.

Value

A data frame containing one or more life tables.

Method

The method implemented by ex_to_lifetab_brass() is based on the observation that, if populations A and B are demographically similar, then, in many cases,

\text{logit}(l_x^{\text{B}}) \approx \alpha + \beta \text{logit}(l_x^{\text{A}})

where l_x is the "survivorship probability" quantity from a life table. When populations are similar, beta is often close to 1.

Given (i) target life expectancy, (ii) a set of l_x^{\text{A}}), (referred to as a "standard"), and (iii) a value for \beta, ex_to_lifetab_brass() finds a value for \alpha that yields a set of l_x^{\text{B}}) with the required life expectancy.

target argument

target is a data frame specifying life expectancies for each population being modelled, and, possibly, inputs to the calculations, and index variables. Values in target are not age-specific.

standard argument

standard is a data frame specifying the l_x to be used with each life expectancy in ex, and, optionally, values the average age person-years lived by people who die in each group, _na_x. Values in standard are age-specific.

Internally, standard is merged with target using a left join from target, on any variables that target and standard have in common.

References

Brass W, Coale AJ. 1968. “Methods of analysis and estimation,” in Brass, W, Coale AJ, Demeny P, Heisel DF, et al. (eds). The Demography of Tropical Africa. Princeton NJ: Princeton University Press, pp. 88–139.

Moultrie TA, Timæus IM. 2013. Introduction to Model Life Tables. In Moultrie T, Dorrington R, Hill A, Hill K, Timæus I, Zaba B. (eds). Tools for Demographic Estimation. Paris: International Union for the Scientific Study of Population. online version.

See Also

Examples

## create new life tables based on level-1
## 'West' model life tables, but with lower
## life expectancy

library(dplyr, warn.conflicts = FALSE)

target <- data.frame(sex = c("Female", "Male"), 
                     ex = c(17.5, 15.6))

standard <- west_lifetab |>
    filter(level == 1) |>
    select(sex, age, lx)
    
ex_to_lifetab_brass(target = target,
                    standard = standard,
                    infant = "CD",
                    child = "CD")

Identify Sex or Gender Labels Referring to Females

Description

Given labels for sex or gender, try to infer which (if any) refer to females. If no elements look like a label for females, or if two or more elements do, then return NULL.

Usage

find_label_female(nms)

Arguments

nms

A character vector

Value

An element of nms or NULL.

See Also

find_label_male(), find_var_sexgender()

Examples

find_label_female(c("Female", "Male")) ## one valid
find_label_female(c("0-4", "5-9"))     ## none valid
find_label_female(c("F", "Fem"))       ## two valid

Identify Sex or Gender Labels Referring to Males

Description

Given labels for sex or gender, try to infer which (if any) refer to males. If no elements look like a label for males, or if two or more elements do, then return NULL.

Usage

find_label_male(nms)

Arguments

nms

A character vector

Value

An element of nms or NULL.

See Also

find_label_female(), find_var_sexgender()

Examples

find_label_male(c("Female", "Male")) ## one valid
find_label_male(c("0-4", "5-9"))     ## none valid
find_label_male(c("male", "m"))      ## two valid

Identify an Age Variable

Description

Find the element of nms that looks like an age variable. If no elements look like an age variable, or if two or more elements do, then return NULL.

Usage

find_var_age(nms)

Arguments

nms

A character vector

Value

An element of nms, or NULL.

See Also

find_var_time(), find_var_sexgender()

Examples

find_var_age(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_age(c("Sex", "Year"))                 ## none valid
find_var_age(c("age", "age.years"))            ## two valid

Identify a Sex or Gender Variable

Description

Find the element of nms that looks like a sex or gender variable. If no elements look like a sex or gender variable, or if two or more elements do, then return NULL.

Usage

find_var_sexgender(nms)

Arguments

nms

A character vector

Value

An element of nms, or NULL.

See Also

find_var_age(), find_var_time(), find_label_female(), find_label_male()

Examples

find_var_sexgender(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_sexgender(c("Age", "Region"))               ## none valid
find_var_sexgender(c("sexgender", "sexes"))          ## two valid

Identify a Time Variable

Description

Find the element of nms that looks like an time variable. If no elements look like a time variable, or if two or more elements do, then return NULL.

Usage

find_var_time(nms)

Arguments

nms

A character vector

Value

An element of nms, or NULL.

See Also

find_var_age(), find_var_sexgender()

Examples

find_var_time(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_time(c("Sex", "Region"))               ## none valid
find_var_time(c("time", "year"))                ## two valid

Get a named vector of column indices for the grouping variables in a grouped data frame

Description

Constructed a named vector of indices equivalent to the vectors produced by tidyselect::eval_select, but for the grouping variables in an object of class "grouped_df".

Usage

groups_colnums(data)

Arguments

data

A data frame.

Details

If data is not grouped, then groups_colnums returns a zero-length vector.

Value

A named integer vector.

Examples

library(dplyr)
df <- data.frame(x = 1:4,
                 g = c(1, 1, 2, 2))
groups_colnums(df)
df <- group_by(df, g)
groups_colnums(df)

Age-Specific Fertility Rates in Iran

Description

Estimates of age-specific fertility rates, (births per 1000 person-years lived) for rural and urban areas, in Iran, 1986-2000. Calculated by Mohammad Jalal Abbasi-Shavazi and Peter McDonald from data from the 2000 Iran Demographic and Health Survey.

Usage

iran_fertility

Format

A tibble with 2010 rows and the following columns:

Source

Tables 4.1 and 4.2 of Abbasi-Shavazi, M J, McDonald, P (2005). National and provincial level fertility trends in Iran, 1972–2006. Australian National University. Working Papers in Demography no. 94.


Calculate Life Tables or Life Expectancies

Description

Calculate life table quantities. Function lifetab() returns an entire life table. Function lifeexp() returns life expectancy at birth. The inputs can be mortality rates (mx) or probabilities of dying (qx), though not both.

Usage

lifetab(
  data,
  mx = NULL,
  qx = NULL,
  age = age,
  sex = NULL,
  ax = NULL,
  by = NULL,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  radix = 1e+05,
  suffix = NULL,
  n_core = 1
)

lifeexp(
  data,
  mx = NULL,
  qx = NULL,
  at = 0,
  age = age,
  sex = NULL,
  ax = NULL,
  by = NULL,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  suffix = NULL,
  n_core = 1
)

Arguments

data

Data frame with mortality data.

mx

<tidyselect> Mortality rates, expressed as deaths per person-year lived. Possibly an rvec.

qx

<tidyselect> Probability of dying within age interval. An alternative to mx. Possibly an rvec.

age

<tidyselect> Age group labels. The labels must be interpretable by functions such as reformat_age() and age_group_type(). The first age group must start at age 0, and the last age group must be "open", with no upper limit.

sex

<tidyselect> Biological sex, with labels that can be interpreted by reformat_sex(). Needed only when infant is "CD" or "AK", or child is "CD".

ax

<tidyselect> Average age at death within age group. Optional. See Details.

by

<tidyselect> Separate life tables, or life expectancies, calculated for each combination the by variables. If a sex variable was specified, then that variable is automatically included among the by variables. If data is a grouped data frame, then the grouping variables take precedence over by.

infant

Method used to calculate life table values in age group "0". Ignored if age does not include age group "0". Default is "constant".

child

Method used to calculate life table values in age group "1-4". Ignored if age does not include age group "0". Default is "constant".

closed

Method used to calculate life table values in closed age intervals other than "0" and "1-4" (ie intervals such as "10-14" or "12"). Default is "constant".

open

Method used to calculate life table values in the final, open age group (eg "80+" or "110+"). Currently the only option is '"constant".

radix

Initial population for the lx column. Default is 100000.

suffix

Optional suffix added to new columns in result.

n_core

Number of cores to use for parallel processing. If n_core is 1 (the default), no parallel processing is done.

at

Age at which life expectancy is calculated (⁠lifeexp() only). Default is ⁠0'. Can be a vector with length > 1.

Value

A tibble.

Definitions of life table quantities

Mortality rates mx are sometimes expressed as deaths per 1000 person-years lived, or per 100,000 person-years lived. lifetab() and lifeexp() assumed that they are expressed as deaths per person-year lived.

Calculation methods

lifetab() and lifeexp() implement several methods for calculating life table quantities from mortality rates. Each method makes different assumptions about the way that mortality rates vary within age intervals:

For a detailed description of the methods, see the vignette for poputils.

ax

ax is the average number of years lived in an age interval by people who die in that interval. Demographers sometimes refer to it as the 'separation factor'. If a non-NA value of ax is supplied for an age group, then the results for that age group are based on the formula

m_x = d_x / (n_x l_x + a_x d_x)

,

(where n_x is the width of the age interval), over-riding any methods specified via the infant, child, closed and open arguments.

Open age group when inputs are qx

The probability of dying, qx, is always 1 in the final (open) age group. qx therefore provides no direct information on mortality conditions within the final age group. lifetab() and lifeexp() use conditions in the second-to-final age group as a proxy for conditions in the final age group. When open is "constant" (which is currently the only option), and no value for ax in the final age group is provided, lifetab() and lifeexp() assume that m_A = m_{A-1}, and set L_{A} = l_A / m_A.

In practice, mortality is likely to be higher in the final age group than in the second-to-final age group, so the default procedure is likely to lead to inaccuracies. When the size of the final age group is very small, these inaccuracies will be inconsequential. But in other cases, it may be necessary to supply an explicit value for ax for the final age group, or to use mx rather than qx as inputs.

Using rvecs to represent uncertainty

An rvec is a 'random vector', holding multiple draws from a distribution. Using an rvec for the mx argument to lifetab() or lifeexp() is a way of representing uncertainty. This uncertainty is propagated through to the life table values, which will also be rvecs.

Parallel processing

Calculations can be slow when working with rvecs and many combinations of 'by' variables. In these cases, setting n_core to a number greater than 1, which triggers parallel processing, may help.

References

See Also

Examples

library(dplyr)

## life table for females based on 'level 1'
## mortality rates "West" model life table
west_lifetab |>
    filter(sex == "Female",
           level == 1) |>
    lifetab(mx = mx)

## change method for infant and children from
## default ("constant") to "CD"
west_lifetab |>
    filter(sex == "Female",
           level == 1) |>
    lifetab(mx = mx,
            sex = sex,
            infant = "CD",
            child = "CD")

## calculate life expectancies
## for all levels, using the 'by'
## argument to distinguish levels
west_lifetab |>
    lifeexp(mx = mx,
            sex = sex,
            infant = "CD",
            child = "CD",
            by = level)

## obtain the same result using
## 'group_by'
west_lifetab |>
  group_by(level) |>
  lifeexp(mx = mx,
          sex = sex,
          infant = "CD",
          child = "CD")

## calculations based on 'qx'
west_lifetab |>
  lifeexp(qx = qx,
          sex = sex,
          by = level)

## life expectancy at age 60
west_lifetab |>
  filter(level == 10) |>
  lifeexp(mx = mx,
          at = 60,
          sex = sex)

## life expectancy at ages 0 and 60
west_lifetab |>
  filter(level == 10) |>
  lifeexp(mx = mx,
          at = c(0, 60),
          sex = sex)

Logit and Inverse-Logit Functions

Description

Transform values to and from the logit scale. logit() calculates

Usage

logit(p)

invlogit(x)

Arguments

p

Values in the interval ⁠[0, 1]⁠. Can be an atomic vector, a matrix, or an rvec.

x

Values in the interval ⁠(-Inf, Inf)⁠. Can be an atomic vector, a matrix, or an rvec.

Details

x = \log \left(\frac{p}{1 - p}\right)

and invlogit() calculates

p = \frac{e^x}{1 + e^x}

To avoid overflow, invlogit() uses p = \frac{1}{1 + e^{-x}} internally for x where x > 0.

In some of the demographic literature, the logit function is defined as

x = \frac{1}{2} \log \left(\frac{p}{1 - p}\right).

logit() and invlogit() follow the conventions in statistics and machine learning, and omit the \frac{1}{2}.

Value

Examples

p <- c(0.5, 1, 0.2)
logit(p)
invlogit(logit(p))

Turn a Matrix Into a List of Columns or Rows

Description

Given a matrix, create a list, each element of which contains a column or row from the matrix.

Usage

matrix_to_list_of_cols(m)

matrix_to_list_of_rows(m)

Arguments

m

A matrix

Details

matrix_to_list_of_cols() and 'matrix_to_list_of_rows() are internal functions, for use by developers, and would not normally be called directly by end users.

Value

Examples

m <- matrix(1:12, nrow = 3)
matrix_to_list_of_cols(m)
matrix_to_list_of_rows(m)

Mortality Data for New Zealand

Description

Counts of deaths and population, by age, sex, and calendar year, plus mortality rates, for New Zealand, 2021-2022.

Usage

nzmort

Format

A data frame with 84 rows and the following variables:

Source

Modified from data in tables "Deaths by age and sex (Annual-Dec)" and "Estimated Resident Population by Age and Sex (1991+) (Annual-Dec)" from Stats NZ online database Infoshare, downloaded on 24 September 2023.


Mortality Data and Probabilistic Rates for New Zealand

Description

A modified version of link{nzmort} where mx columns is an rvec, rather than an ordinary R vector. The rvec holds the random draws from the posterior distribution obtained from by a Bayesian statistical model.

Usage

nzmort_rvec

Format

An object of class tbl_df (inherits from tbl, data.frame) with 84 rows and 4 columns.


Convert q0 to m0

Description

Convert the probability of dying during infancy (q0) to the mortality rate for infancy (m0).

Usage

q0_to_m0(
  q0,
  sex = NULL,
  a0 = NULL,
  infant = c("constant", "linear", "CD", "AK")
)

Arguments

q0

Probability of dying in first year of life. A numeric vector or an rvec.

sex

Biological sex. A vector the same length as q0, with labels that can be interpreted by reformat_sex(). Needed only when infant is "CD" or "AK".

a0

Average age at death for infants who die. Optional. See help for lifetab().

infant

Calculation method. See help for lifetab(). Default is "constant".

Value

A numeric vector or rvec.

Warning

The term "infant mortality rate" is ambiguous. Demographers sometimes use it to refer to m0 (which is an actual rate) and sometimes use it to refer to q0 (which is a probability.)

See Also

Examples

library(dplyr, warn.conflicts = FALSE)
west_lifetab |>
 filter(age == 0, level <= 5) |>
 select(level, sex, age, mx, qx) |>
 mutate(m0 = q0_to_m0(q0 = qx, sex = sex, infant = "CD"))

Reformat Age Group Labels

Description

Convert age group labels to one of three formats:

By default reformat_age() returns a factor that includes all intermediate age groups. See below for examples.

Usage

reformat_age(x, factor = TRUE)

Arguments

x

A vector.

factor

Whether the return value should be a factor.

Details

reformat_age() applies the following algorithm:

  1. Tidy and translate text, eg convert "20 to 24 years" to "20-24", convert "infant" to "0", or convert "100 or more" to "100+".

  2. Check whether the resulting labels could have been produced by age_labels(). If not, throw an error.

  3. If factor is TRUE (the default), then return a factor. The levels of this factor include all intermediate age groups. Otherwise return a character vector.

When x consists entirely of numbers, reformat_age() also checks for two special cases:

Value

If factor is TRUE, then reformat_age() returns a factor; otherwise it returns a character vector.

See Also

age_labels(), reformat_sex()

Examples

reformat_age(c("80 to 84", "90 or more", "85 to 89"))

## factor contains intermediate level missing from 'x'
reformat_age(c("80 to 84", "90 or more"))

## non-factor
reformat_age(c("80 to 84", "90 or more"),
          factor = FALSE)

## single
reformat_age(c("80", "90plus"))

## life table
reformat_age(c("0",
            "30-34",
            "10--14",
            "1-4 years"))

Reformat a Binary Sex Variable

Description

Reformat a binary sex variable so that it consists entirely of values "Female", "Male", and possibly NA and any values included in except.

Usage

reformat_sex(x, except = NULL, factor = TRUE)

Arguments

x

A vector.

except

Values to exclude when reformatting.

factor

Whether the return value should be a factor.

Details

When parsing labels, reformat_sex() ignores case: "FEMALE" and "fEmAlE" are equivalent.

White space is removed from the beginning and end of labels.

reformat_sex() does not try to interpreting numeric codes (eg 1, 2).

Value

If factor is TRUE, then reformat_age() returns a factor; otherwise it returns a character vector.

See Also

age_labels(), reformat_age()

Examples

reformat_sex(c("F", "female", NA, "MALES"))

## values supplied for 'except'
reformat_sex(c("Fem", "Other", "Male", "M"),
             except = c("Other", "Diverse"))

## return an ordinary character vector
reformat_sex(c("F", "female", NA, "MALES"),
             factor = FALSE)

Randomly Round A Vector of Integers to Base 3

Description

Apply the 'Random Round to Base 3' (RR3) algorithm to a vector of integers (or doubles where round(x) == x.)

Usage

rr3(x)

Arguments

x

A vector of integers (in the sense that round(x) == x.) Can be an rvec.

Details

The RR3 algorithm is used by statistical agencies to confidentialize data. Under the RR3 algorithm, an integer n is randomly rounded as follows:

RR3 has some nice properties:

Value

A randomly-rounded version of x.

Examples

x <- c(1, 5, 2, 0, -1, 3, NA)
rr3(x)

Specify Open Age Group

Description

Set the lower limit of the open age group. Given a vector of age group labels, recode all age groups with a lower limit greater than or equal to ⁠<lower>⁠ to ⁠<lower>+⁠.

Usage

set_age_open(x, lower)

Arguments

x

A vector of age labels.

lower

An integer. The lower limit for the open age group.

Details

set_age_open() requires that x and the return value have a a five-year, single-year, or life table format, as described in age_labels().

Value

A modified version of x.

See Also

Examples

x <- c("100+", "80-84", "95-99", "20-24")
set_age_open(x, 90)
set_age_open(x, 25)

Calculate Total Fertility Rates

Description

Calculate the total fertility rate (TFR) from age-specific fertility rates.

Usage

tfr(
  data,
  asfr = NULL,
  age = age,
  sex = NULL,
  by = NULL,
  denominator = 1,
  suffix = NULL
)

Arguments

data

Data frame with age-specific fertility rates and age

asfr

Age-specific fertility rates. Possibly an rvec.

age

<tidyselect> Age group labels. The labels must be interpretable by functions such as reformat_age() and age_group_type(). The age groups must not have gaps, and the highest age group must be "closed" (ie have an upper limit.)

sex

<tidyselect> Sex/gender of the child (not the parent).

by

<tidyselect> Separate total fertility rates are calculated for each combination the by variables. If data is a grouped data frame, then the grouping variables take precedence over by.

denominator

The denominator used to calculate asfr. Default is 1.

suffix

Optional suffix added to "tfr" column in result.

Details

The total fertility rate is a summary measures for current fertility levels that removes the effect of age structure. Is obtained by summing up age-specific fertility rates, multiplying each rate by the width of the corresponding age group. For instance, the rate for age group "15-19" is multiplied by 5, and the rate for age group "15" is multiplied by 1.

The total fertility rate can be interpreted as the number of average children that a person would have, under prevailing fertility rates, if the person survived to the maximum age of reproduction. The hypothetical person is normally a woman, since age-specific fertility rates normally use person-years lived by women as the denominator. But it can apply to men, if the age-specific fertility rates are "paternity rates", ie rates that use person-years lived by men as the denominator.

Value

A tibble.

Sex-specific fertility rates

Age-specific fertility rates do not normally specify the sex of the children who are born. In cases where they do, however, rates have to be summed across sexes to give the total fertility rates. If tfr() is supplied with a sex argument, it assumes that sex applies to the births, and sums over the sexes.

Denominator

Published tables of age-specific fertility rates often express the rates as births per 1000 person-years lived, rather than per person-year lived. (Sometimes this is expressed as "births per 1000 women".) In these cases

Using rvecs to represent uncertainty

An rvec is a 'random vector', holding multiple draws from a distribution. Using an rvec for the asfr argument to tfr() is a way of representing uncertainty. This uncertainty is propagated through to the TFR, which will also be rvecs.

See Also

Examples

iran_fertility |>
  tfr(asfr = rate,
      by = c(area, time),
      denominator = 1000)

Build a Matrix from Measure and ID Variables

Description

Build a matrix where the elements are values of a measure variable, and the rows and columns are formed by observed combinations of ID variables. The ID variables picked out by rows and cols must uniquely identify cells. to_matrix(), unlike stats::xtabs(), does not sum across multiple combinations of ID variables.

Usage

to_matrix(x, rows, cols, measure)

Arguments

x

A data frame.

rows

The ID variable(s) used to distinguish rows in the matrix.

cols

The ID variable(s) used to distinguish columns in the matrix.

measure

The measure variable, eg rates or counts.

Value

A matrix

Examples

x <- expand.grid(age = c(0, 1, 2),
                 sex = c("F", "M"),
                 region = c("A", "B"),
                 year = 2000:2001)
x$count <- 1:24

to_matrix(x,
          rows = c(age, sex),
          cols = c(region, year),
          measure = count)

to_matrix(x,
          rows = c(age, sex, region),
          cols = year,
          measure = count)

## cells not uniquely identified
try(
to_matrix(x,
          rows = age,
          cols = sex,
          measure = count)
)

Trim Values So They Are Between 0 and 1

Description

Trim a vector so that all values are greater than 0 and less than 1.

Usage

trim_01(x)

Arguments

x

A numeric vector. Can be an rvec.

Details

If

Value

A trimmed version of x

See Also

Examples

x <- c(1, 0.98, -0.001, 0.5, 0.01)
trim_01(x)

Coale-Demeny West Model Life Tables

Description

Life table quantities from the "West" family of Coale-Demeny model life tables.

Usage

west_lifetab

Format

A data frame with 1,050 rows and the following variables:

Source

Coale A, Demeny P, and Vaughn B. 1983. Regional model life tables and stable populations. 2nd ed. New York: Academic Press, accessed via demogR::cdmltw().