manydist: Unbiased Distances for Mixed-Type Data
A comprehensive framework for calculating unbiased distances in datasets
containing mixed-type variables (numerical and categorical). The package implements
a general formulation that ensures multivariate additivity and commensurability,
meaning that variables contribute equally to the overall distance regardless of
their type, scale, or distribution. Supports multiple distance measures including
Gower's distance, Euclidean distance, Manhattan distance, and various categorical
variable distances such as simple matching, Eskin, occurrence frequency, and
association-based distances. Provides tools for variable scaling (standard
deviation, range, robust range, and principal component scaling), and handles
both independent and association-based category dissimilarities. Implements
methods to correct for biases that typically arise from different variable types,
distributions, and number of categories. Particularly useful for cluster analysis,
data visualization, and other distance-based methods when working with mixed data.
Methods based on van de Velden et al. (2024) <doi:10.48550/arXiv.2411.00429>
"Unbiased mixed variables distance".
Version: |
0.4.3 |
Depends: |
R (≥ 4.1.0) |
Imports: |
entropy, Matrix, fastDummies, data.table, philentropy, cluster, purrr, dplyr, tidyr, forcats, tibble, magrittr, fpc, recipes, rsample, Rfast, readr, distances |
Suggests: |
palmerpenguins |
Published: |
2025-02-12 |
DOI: |
10.32614/CRAN.package.manydist |
Author: |
Alfonso Iodice D'Enza [aut],
Angelos Markos [aut, cre],
Michel van de Velden [aut],
Carlo Cavicchia [aut] |
Maintainer: |
Angelos Markos <amarkos at gmail.com> |
License: |
GPL-3 |
NeedsCompilation: |
no |
Citation: |
manydist citation info |
CRAN checks: |
manydist results |
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=manydist
to link to this page.