dumblump

Lump a numeric variable into categorical groups using ‘dumblump’ algorithm

The dumblump algorithm:

Sort numbers in ascending order
For each number, check its distance from the previous number (the closest, lower number in dataset).
If distance >= threshold, define a new group. If distance < threshold, ‘lump’ with the group of the previous number

Disadvantages of this method 1. You can get numbers of substantially different scales in a single group. E.g. If you have a set of numbers 1, 2, 3,4, 5, 6, 7 … 100000.

These will all be classified as a single group unless theres a ‘break’ of > threshold somewhere along. If this is not what you want, explore clustering methods

Installation

You can install the development version of dumblump like so:

#install.packages('remotes')
remotes::install_github('selkamand/dumblump')

Usage

This is a basic example which shows you how to solve a common problem:

library(dumblump)

unlumped <- c(1, 1, 2, 5,5 , 6, 1, 12, 12)


lumped <- dumblump(unlumped, threshold = 1)
data.frame(lumped, unlumped)
#>    lumped unlumped
#> 1 Group 1        1
#> 2 Group 1        1
#> 3 Group 2        2
#> 4 Group 3        5
#> 5 Group 3        5
#> 6 Group 4        6
#> 7 Group 1        1
#> 8 Group 5       12
#> 9 Group 5       12

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
R		R
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
dumblump.Rproj		dumblump.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

dumblump

The dumblump algorithm:

Installation

Usage

About

Licenses found

Releases

Packages

Languages

License

Licenses found

selkamand/dumblump

Folders and files

Latest commit

History

Repository files navigation

dumblump

The dumblump algorithm:

Installation

Usage

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages