Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
leeper committed Jun 16, 2016
1 parent 45f7b81 commit a270319
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 18 deletions.
50 changes: 41 additions & 9 deletions README.Rmd
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
# Universal Numeric Fingerprint #

[![Build Status](https://travis-ci.org/leeper/UNF.svg?branch=master)](https://travis-ci.org/leeper/UNF)
[![Build status](https://ci.appveyor.com/api/projects/status/tx3dkw1rsr9kijm4?svg=true)](https://ci.appveyor.com/project/leeper/unf)
[![codecov.io](http://codecov.io/github/leeper/UNF/coverage.svg?branch=master)](http://codecov.io/github/leeper/UNF?branch=master)
![Downloads](http://cranlogs.r-pkg.org/badges/UNF)
# Universal Numeric Fingerprint

UNF is a cryptographic hash or signature that can be used to uniquely identify (a version of) a rectangular dataset, or a subset thereof. UNF can be used, in tandem with a DOI or Handle, to form a persistent citation to a versioned dataset. A UNF signature is printed in the following form:

Expand All @@ -15,7 +10,7 @@ This allows a data consumer to quickly, easily, and definitively verify an in-ha

Please report any mismatches between this implementation and any other implementation (including Dataverse's) on [the issues page](https://github.com/leeper/UNF/issues)!

## Why UNFs? ##
## Why UNFs?

While file checksums are a common strategy for verifying a file (e.g., md5 sums are available for validating R packages), they are not well-suited to being used as global signatures for a dataset. A UNF differs from an ordinary file checksum in several important ways:

Expand Down Expand Up @@ -55,9 +50,9 @@ While file checksums are a common strategy for verifying a file (e.g., md5 sums

4. *UNFs are strongly tamper resistant.* Any accidental or intentional changes to data values will change the resulting UNF. Most file checksums and descriptive statistics detect only certain types of changes.

## Package Functionality ##
## Package Functionality

- `unf`: The core `unf` function calculates the UNF signature for almost any R object for UNF algorithm versions 3, 4, 4.1, 5, or 6, with options to control the rounding of numeric values, truncation of character strings, and some idiosyncratic details of the UNFv5 algorithm as implemented by Dataverse. `unf` is a wrapper for functions `unf6`, `unf5`, `unf4`, and `unf3`, which calculate vector-level UNF signatures.
- `unf()`: The core `unf()` function calculates the UNF signature for almost any R object for UNF algorithm versions 3, 4, 4.1, 5, or 6, with options to control the rounding of numeric values, truncation of character strings, and some idiosyncratic details of the UNFv5 algorithm as implemented by Dataverse. `unf()` is a wrapper for functions `unf6()`, `unf5()`, `unf4()`, and `unf3()`, which calculate vector-level UNF signatures.

```{r}
unf(iris)
Expand All @@ -72,3 +67,40 @@ While file checksums are a common strategy for verifying a file (e.g., md5 sums
unf(iris) %unf% unf(iris[,1:3])
unf(iris) %unf% head(iris[,1:3])
```

- `as.unfvector()` is an S3 generic method that standardizes any R vector into the standardized character representation described by the UNF specification. While this functionality is primarily for internal use, it can be helpful for clarifying the difference (or lack thereof) between floating point numbers or between objects with identical meaning but different class representations that perhaps resulted for flawed data importing:

```{r}
# floating point ambiguity
.14*10 == 1.4
as.unfvector(.14*10) == as.unfvector(1.4)
# substantively irrelevant class differences
c(0L, 1L) == c(FALSE, TRUE)
as.unfvector(c(0L, 1L))
as.unfvector(c(FALSE, TRUE))
```

## Installation

[![Build Status](https://travis-ci.org/leeper/UNF.svg?branch=master)](https://travis-ci.org/leeper/UNF)
[![Build status](https://ci.appveyor.com/api/projects/status/tx3dkw1rsr9kijm4?svg=true)](https://ci.appveyor.com/project/leeper/unf)
[![codecov.io](http://codecov.io/github/leeper/UNF/coverage.svg?branch=master)](http://codecov.io/github/leeper/UNF?branch=master)
![Downloads](http://cranlogs.r-pkg.org/badges/UNF)

UNF is on CRAN. To install the latest version, simply use:

```R
install.packages("UNF")
```

To install the latest development version of **UNF** from GitHub:

```R
# latest (potentially unstable) version from GitHub
if (!require("ghit")) {
install.packages("ghit")
}
ghit::install_github("leeper/UNF")
```

82 changes: 73 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
# Universal Numeric Fingerprint #

[![Build Status](https://travis-ci.org/leeper/UNF.svg?branch=master)](https://travis-ci.org/leeper/UNF)
[![Build status](https://ci.appveyor.com/api/projects/status/tx3dkw1rsr9kijm4?svg=true)](https://ci.appveyor.com/project/leeper/unf)
[![codecov.io](http://codecov.io/github/leeper/UNF/coverage.svg?branch=master)](http://codecov.io/github/leeper/UNF?branch=master)
![Downloads](http://cranlogs.r-pkg.org/badges/UNF)
# Universal Numeric Fingerprint

UNF is a cryptographic hash or signature that can be used to uniquely identify (a version of) a rectangular dataset, or a subset thereof. UNF can be used, in tandem with a DOI or Handle, to form a persistent citation to a versioned dataset. A UNF signature is printed in the following form:

Expand All @@ -15,7 +10,7 @@ This allows a data consumer to quickly, easily, and definitively verify an in-ha

Please report any mismatches between this implementation and any other implementation (including Dataverse's) on [the issues page](https://github.com/leeper/UNF/issues)!

## Why UNFs? ##
## Why UNFs?

While file checksums are a common strategy for verifying a file (e.g., md5 sums are available for validating R packages), they are not well-suited to being used as global signatures for a dataset. A UNF differs from an ordinary file checksum in several important ways:

Expand Down Expand Up @@ -88,9 +83,9 @@ While file checksums are a common strategy for verifying a file (e.g., md5 sums

4. *UNFs are strongly tamper resistant.* Any accidental or intentional changes to data values will change the resulting UNF. Most file checksums and descriptive statistics detect only certain types of changes.

## Package Functionality ##
## Package Functionality

- `unf`: The core `unf` function calculates the UNF signature for almost any R object for UNF algorithm versions 3, 4, 4.1, 5, or 6, with options to control the rounding of numeric values, truncation of character strings, and some idiosyncratic details of the UNFv5 algorithm as implemented by Dataverse. `unf` is a wrapper for functions `unf6`, `unf5`, `unf4`, and `unf3`, which calculate vector-level UNF signatures.
- `unf()`: The core `unf()` function calculates the UNF signature for almost any R object for UNF algorithm versions 3, 4, 4.1, 5, or 6, with options to control the rounding of numeric values, truncation of character strings, and some idiosyncratic details of the UNFv5 algorithm as implemented by Dataverse. `unf()` is a wrapper for functions `unf6()`, `unf5()`, `unf4()`, and `unf3()`, which calculate vector-level UNF signatures.


```r
Expand Down Expand Up @@ -172,3 +167,72 @@ While file checksums are a common strategy for verifying a file (e.g., md5 sums
## Sepal.Width: e6etgUxSU/7XccLSwNzHVQ==
## Petal.Length: oSk42LS4+joAOdTAr9OChQ==
```

- `as.unfvector()` is an S3 generic method that standardizes any R vector into the standardized character representation described by the UNF specification. While this functionality is primarily for internal use, it can be helpful for clarifying the difference (or lack thereof) between floating point numbers or between objects with identical meaning but different class representations that perhaps resulted for flawed data importing:


```r
# floating point ambiguity
.14*10 == 1.4
```

```
## [1] FALSE
```

```r
as.unfvector(.14*10) == as.unfvector(1.4)
```

```
## [1] TRUE
```

```r
# substantively irrelevant class differences
c(0L, 1L) == c(FALSE, TRUE)
```

```
## [1] TRUE TRUE
```

```r
as.unfvector(c(0L, 1L))
```

```
## [1] "+0.e+" "+1.e+"
```

```r
as.unfvector(c(FALSE, TRUE))
```

```
## [1] "+0.e+" "+1.e+"
```

## Installation

[![Build Status](https://travis-ci.org/leeper/UNF.svg?branch=master)](https://travis-ci.org/leeper/UNF)
[![Build status](https://ci.appveyor.com/api/projects/status/tx3dkw1rsr9kijm4?svg=true)](https://ci.appveyor.com/project/leeper/unf)
[![codecov.io](http://codecov.io/github/leeper/UNF/coverage.svg?branch=master)](http://codecov.io/github/leeper/UNF?branch=master)
![Downloads](http://cranlogs.r-pkg.org/badges/UNF)

UNF is on CRAN. To install the latest version, simply use:

```R
install.packages("UNF")
```

To install the latest development version of **UNF** from GitHub:

```R
# latest (potentially unstable) version from GitHub
if (!require("ghit")) {
install.packages("ghit")
}
ghit::install_github("leeper/UNF")
```

0 comments on commit a270319

Please sign in to comment.