Ubuntu Linux 16.04 LTS |
R-devel with rchk |
PREPERROR |
diff --git a/docs/favicon-16x16.png b/docs/favicon-16x16.png
index 79b621e..c3223b8 100644
Binary files a/docs/favicon-16x16.png and b/docs/favicon-16x16.png differ
diff --git a/docs/favicon-32x32.png b/docs/favicon-32x32.png
index af0d2cd..25eaab5 100644
Binary files a/docs/favicon-32x32.png and b/docs/favicon-32x32.png differ
diff --git a/docs/index.html b/docs/index.html
index 7269f61..00bd5f2 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -49,7 +49,7 @@
tidycells
- 0.2.0
+ 0.2.1
@@ -128,7 +128,13 @@
TL;DR
Given a file_name
which is a path of a file that contains table(s). Run this read_cells()
in the R-console to see whether support is present for the file type. If support is present, just run
-Note Just start with a small file.
+Note
+
+- Just start with a small file, as heuristic-algorithm are not well-optimized (yet).
+- If the target table has numerical values as data and text as their attribute (identifier of the data elements), straight forward method is sufficient in the majority of situations. Otherwise, you may need to utilize other functions.
+
+A Word of Warning :
+Many functions in this package are heuristic-algorithm based. Thus, outcomes may be unexpected. I recommend you to try read_cells
on the target file. If the outcome is what you are expecting, it is fine. If not try again with read_cells(file_name, at_level = "compose")
. If after that also the output is not as expected then other functions are required to be used. At that time start again with read_cells(file_name, at_level = "make_cells")
and proceed to further functions.
After this you need to run compose_cells
(with argument print_attribute_overview = TRUE
)
- If you want a well-aligned columns then you may like to do
+
+If you want a well-aligned columns then you may like to do
# bit tricky and tedious unless you do print_attribute_overview = TRUE in above line
dcfine <- dc %>%
dplyr::mutate(name = dplyr::case_when(
@@ -472,7 +479,10 @@
The rsheets project: It hosts several R packages (few of them are in CRAN already) which are in the early stages of importing spreadsheets from Excel and Google Sheets into R. Specifically, have a look at these projects which seems closely related to these projects : jailbreaker, rexcel (README of this project has a wonderful reference for excel integration with R).
readabs: Download and Tidy Time Series Data from the Australian Bureau of Statistics The readabs
package helps you easily download, import, and tidy time series data from the Australian Bureau of Statistics from within R. This saves you time manually downloading and tediously tidying time series data and allows you to spend more time on your analysis.
+
+ezpickr: Easy Data Import Using GUI File Picker and Seamless Communication Between an Excel and R Gives ability for choosing any rectangular data file using interactive GUI dialog box, and seamlessly manipulating tidy data between an ‘Excel’ window and R session.
The tidyABS package: The tidyABS
package converts ABS excel tables to tidy data frames. It uses rules-of-thumb to determine the structure of excel tables, however it sometimes requires pointers from the user. This package is in early development.
+The hypoparsr package: This package takes a different approach to CSV parsing by creating different parsing hypotheses for a given file and ranking them based on data quality features.
@@ -532,13 +542,14 @@
Developers
Dev status
diff --git a/docs/news/index.html b/docs/news/index.html
index 0fd4cb7..db23fb2 100644
--- a/docs/news/index.html
+++ b/docs/news/index.html
@@ -70,7 +70,7 @@
tidycells
- 0.2.0
+ 0.2.1
@@ -125,6 +125,25 @@ Changelog
Source: NEWS.md
+
+
+
+
+New features
+
+- Enhancement in the heuristic-based algorithm
+
+
+
+
+Other changes
+
+- Now if
read_cells
fails in the intermediate stage, it will give the output of last successful stage
+
+
+
@@ -143,9 +160,9 @@
-
+
-New Features
+
New Features`
- Added
collate_columns
to collate attribute-columns having similar content.
@@ -160,7 +177,7 @@
Initial Public Release
- Initial Release to GitHub
-- Prior to this it was private package
+- Prior to this it was a private package
@@ -170,6 +187,7 @@
Contents
+ - 0.2.1
- 0.2.0
- 0.1.9
- 0.1.0
diff --git a/docs/reference/analyse_cells.html b/docs/reference/analyse_cells.html
index c4d61a7..8d27c21 100644
--- a/docs/reference/analyse_cells.html
+++ b/docs/reference/analyse_cells.html
@@ -78,7 +78,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/analyze_cells.html b/docs/reference/analyze_cells.html
index fdf0a1b..1d53796 100644
--- a/docs/reference/analyze_cells.html
+++ b/docs/reference/analyze_cells.html
@@ -77,7 +77,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/as_cell_df.html b/docs/reference/as_cell_df.html
index b7f6025..a308307 100644
--- a/docs/reference/as_cell_df.html
+++ b/docs/reference/as_cell_df.html
@@ -73,7 +73,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/basic_classifier.html b/docs/reference/basic_classifier.html
index 3b51956..69327ed 100644
--- a/docs/reference/basic_classifier.html
+++ b/docs/reference/basic_classifier.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/cell_analysis-class.html b/docs/reference/cell_analysis-class.html
index c395b67..f960b14 100644
--- a/docs/reference/cell_analysis-class.html
+++ b/docs/reference/cell_analysis-class.html
@@ -73,7 +73,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/cell_composition_traceback.html b/docs/reference/cell_composition_traceback.html
index 692f5a0..487388f 100644
--- a/docs/reference/cell_composition_traceback.html
+++ b/docs/reference/cell_composition_traceback.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/cell_df-class.html b/docs/reference/cell_df-class.html
index 5e57f62..f7fcdea 100644
--- a/docs/reference/cell_df-class.html
+++ b/docs/reference/cell_df-class.html
@@ -73,7 +73,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/collate_columns.html b/docs/reference/collate_columns.html
index d051b14..4933cf8 100644
--- a/docs/reference/collate_columns.html
+++ b/docs/reference/collate_columns.html
@@ -73,7 +73,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/compose_cells.html b/docs/reference/compose_cells.html
index 76a26fc..f990d64 100644
--- a/docs/reference/compose_cells.html
+++ b/docs/reference/compose_cells.html
@@ -73,7 +73,7 @@
tidycells
- 0.2.0
+ 0.2.1
@@ -137,7 +137,8 @@ Compose a Cell Analysis to a tidy form
compose_cells(ca, post_process = TRUE, attr_sep = " :: ",
- discard_raw_cols = FALSE, print_attribute_overview = FALSE)
+ discard_raw_cols = FALSE, print_attribute_overview = FALSE,
+ silent = FALSE)
Arguments
@@ -162,6 +163,10 @@ Arg
print_attribute_overview |
print the overview of the attributes (4 distinct values from each attribute of each block) |
+
+ silent |
+ whether to suppress warning message on compose failure (Default FALSE ) |
+
Value
diff --git a/docs/reference/get_direction.html b/docs/reference/get_direction.html
index 3d5189b..a33353f 100644
--- a/docs/reference/get_direction.html
+++ b/docs/reference/get_direction.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/get_direction_df.html b/docs/reference/get_direction_df.html
index 7e25939..9e7a4e6 100644
--- a/docs/reference/get_direction_df.html
+++ b/docs/reference/get_direction_df.html
@@ -75,7 +75,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/get_direction_metric.html b/docs/reference/get_direction_metric.html
index 19a45fd..7d6d5dc 100644
--- a/docs/reference/get_direction_metric.html
+++ b/docs/reference/get_direction_metric.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/get_group_id.html b/docs/reference/get_group_id.html
index e9dbe7d..aad1183 100644
--- a/docs/reference/get_group_id.html
+++ b/docs/reference/get_group_id.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/get_unpivotr_direction_names.html b/docs/reference/get_unpivotr_direction_names.html
index 0ef932e..3224c01 100644
--- a/docs/reference/get_unpivotr_direction_names.html
+++ b/docs/reference/get_unpivotr_direction_names.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/index.html b/docs/reference/index.html
index a50dbe7..dd0ed94 100644
--- a/docs/reference/index.html
+++ b/docs/reference/index.html
@@ -70,7 +70,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/numeric_values_classifier.html b/docs/reference/numeric_values_classifier.html
index 33cc1cc..370380a 100644
--- a/docs/reference/numeric_values_classifier.html
+++ b/docs/reference/numeric_values_classifier.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/pipe.html b/docs/reference/pipe.html
index 570d933..957a08e 100644
--- a/docs/reference/pipe.html
+++ b/docs/reference/pipe.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/read_cell_part-class.html b/docs/reference/read_cell_part-class.html
index 9adc84e..20799f2 100644
--- a/docs/reference/read_cell_part-class.html
+++ b/docs/reference/read_cell_part-class.html
@@ -75,7 +75,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/read_cells.html b/docs/reference/read_cells.html
index c73d8bd..ec28f9d 100644
--- a/docs/reference/read_cells.html
+++ b/docs/reference/read_cells.html
@@ -46,7 +46,21 @@
the installed packages. To see the list of supported files and potentially required packages (if any) just
run read_cells() in the console. This function supports the file format based on content and not based on just the file
extension. That means if a file is saved as pdf and then the extension is removed (or extension modified to say .xlsx)
-then also the read_cells will detect it as pdf and read its content." />
+then also the read_cells will detect it as pdf and read its content.
+Note :
+read_cells is supposed to work for any kind of data. However, if it fails in intermediate stage it will raise
+a warning and give results till successfully processed stage.
+The heuristic-algorithm are not well-optimized (yet) so may be slow on large files.
+If the target table has numerical values as data and text as their attribute (identifier of the data elements),
+straight forward method is sufficient in the majority of situations. Otherwise, you may need to utilize other functions.
+" />
+
+
@@ -79,7 +93,7 @@
tidycells
- 0.2.0
+ 0.2.1
@@ -145,6 +159,20 @@ Read Cells from file
run read_cells()
in the console. This function supports the file format based on content and not based on just the file
extension. That means if a file is saved as pdf and then the extension is removed (or extension modified to say .xlsx
)
then also the read_cells
will detect it as pdf and read its content.
+Note :
+read_cells
is supposed to work for any kind of data. However, if it fails in intermediate stage it will raise
+a warning and give results till successfully processed stage.
+The heuristic-algorithm are not well-optimized (yet) so may be slow on large files.
+If the target table has numerical values as data and text as their attribute (identifier of the data elements),
+straight forward method is sufficient in the majority of situations. Otherwise, you may need to utilize other functions.
+
+
+ A Word of Warning :
+The functions used inside read_cells
are heuristic-algorithm based. Thus, outcomes may be unexpected.
+It is recommend to try read_cells
on the target file. If the outcome is expected., it is fine.
+If not try again with read_cells(file_name, at_level = "compose")
. If after that also the output is not as expected
+then other functions are required to be used. At that time start again with read_cells(file_name, at_level = "make_cells")
+and proceed to further functions.
@@ -219,8 +247,10 @@ Examp
read_cells()#>
Please provide a valid file path to process.
#> Support present for following type of files: csv, xls, xlsx, doc, docx, pdf, html
#> Note:
-#> = LibreOffice present so doc files will be supported but it may take little longer time to read/detect.
-#> You may need to open LibreOffice outside this R-Session manually to speed it up.
+#> = LibreOffice is present so doc files will be supported but it may take little longer time to read/detect.
+#> You may need to open LibreOffice outside this R-Session manually to speed it up.
+#> In case the doc is not working, try running docxtractr::read_docx('<target doc file>').
+#> Check whether the file is being read correctly.
#> = Support is enabled for content type (means it will work even if the extension is wrong)
#>
#> Details:
@@ -246,10 +276,10 @@ Examp
read_cells(fcsv)
#> # A tibble: 4 x 5
#> collated_1 collated_2 collated_3 table_tag value
#> <chr> <chr> <chr> <chr> <chr>
-#> 1 Nakshatra Weight Kid Name Table_1 12
-#> 2 Titas Weight Kid Name Table_1 16
-#> 3 Nakshatra Age Kid Name Table_1 1.5
-#> 4 Titas Age Kid Name Table_1 6
read_cells(fcsv, simplify = FALSE)
#> A partial read_cell
+#> 1 Weight Nakshatra Kid Name Table_1 12
+#> 2 Weight Titas Kid Name Table_1 16
+#> 3 Age Nakshatra Kid Name Table_1 1.5
+#> 4 Age Titas Kid Name Table_1 6
read_cells(fcsv, simplify = FALSE)
#> A partial read_cell
#> At stage collate
diff --git a/docs/reference/sample_based_classifier.html b/docs/reference/sample_based_classifier.html
index e639d04..e910890 100644
--- a/docs/reference/sample_based_classifier.html
+++ b/docs/reference/sample_based_classifier.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/tidycells-package.html b/docs/reference/tidycells-package.html
index 70bb808..24ab161 100644
--- a/docs/reference/tidycells-package.html
+++ b/docs/reference/tidycells-package.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/validate_cells.html b/docs/reference/validate_cells.html
index f88132e..a378fed 100644
--- a/docs/reference/validate_cells.html
+++ b/docs/reference/validate_cells.html
@@ -72,7 +72,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/value_attribute_classify.html b/docs/reference/value_attribute_classify.html
index e643680..26f807e 100644
--- a/docs/reference/value_attribute_classify.html
+++ b/docs/reference/value_attribute_classify.html
@@ -75,7 +75,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/visual_functions.html b/docs/reference/visual_functions.html
index ff832ec..ffdf4ac 100644
--- a/docs/reference/visual_functions.html
+++ b/docs/reference/visual_functions.html
@@ -74,7 +74,7 @@
tidycells
- 0.2.0
+ 0.2.1
diff --git a/pkgdown/favicon/apple-touch-icon-120x120.png b/pkgdown/favicon/apple-touch-icon-120x120.png
index 61352c8..dd7b857 100644
Binary files a/pkgdown/favicon/apple-touch-icon-120x120.png and b/pkgdown/favicon/apple-touch-icon-120x120.png differ
diff --git a/pkgdown/favicon/apple-touch-icon-60x60.png b/pkgdown/favicon/apple-touch-icon-60x60.png
index 62d9721..d76c8f7 100644
Binary files a/pkgdown/favicon/apple-touch-icon-60x60.png and b/pkgdown/favicon/apple-touch-icon-60x60.png differ
diff --git a/pkgdown/favicon/apple-touch-icon-76x76.png b/pkgdown/favicon/apple-touch-icon-76x76.png
index f74cdbe..3364bd5 100644
Binary files a/pkgdown/favicon/apple-touch-icon-76x76.png and b/pkgdown/favicon/apple-touch-icon-76x76.png differ
diff --git a/pkgdown/favicon/apple-touch-icon.png b/pkgdown/favicon/apple-touch-icon.png
index 5320c2d..09d52ad 100644
Binary files a/pkgdown/favicon/apple-touch-icon.png and b/pkgdown/favicon/apple-touch-icon.png differ
diff --git a/pkgdown/favicon/favicon-16x16.png b/pkgdown/favicon/favicon-16x16.png
index 79b621e..c3223b8 100644
Binary files a/pkgdown/favicon/favicon-16x16.png and b/pkgdown/favicon/favicon-16x16.png differ
diff --git a/pkgdown/favicon/favicon-32x32.png b/pkgdown/favicon/favicon-32x32.png
index af0d2cd..25eaab5 100644
Binary files a/pkgdown/favicon/favicon-32x32.png and b/pkgdown/favicon/favicon-32x32.png differ
diff --git a/tests/testthat/test-etc.R b/tests/testthat/test-etc.R
index c2fd6ea..525a991 100644
--- a/tests/testthat/test-etc.R
+++ b/tests/testthat/test-etc.R
@@ -23,11 +23,41 @@ test_that("etc works", {
expect_equal(norm_this(0.6), 1)
expect_equal(norm_this(0.1), 0)
+ # once fj is called through purrr::reduce covr is not seeing it
+ # writing separate test for it
+ expect_equal(
+ fj(tibble(x = c(1, 2), y = 2), tibble(x = c(2, 3), y0 = 2), join_by = "x"),
+ tibble(x = c(1, 2, 3), y = c(2, 2, NA), y0 = c(NA, 2, 2))
+ )
+
+ expect_error(
+ fj(tibble(x = c(1, 2), y = 2), tibble(x = c(2, 3), y = 2), join_by = "x"),
+ "unexpected error while joining"
+ )
+
+ expect_equal(
+ fj(tibble(x = c(1, 2), y = 2), tibble(x = c(2, 3), y = 2), join_by = "x", sallow_join = TRUE),
+ tibble(x = c(1, 2, 3), y = c(2, 2, ""))
+ )
+
+ expect_equal(
+ fj(tibble(x = c(1, 2), y = 2), tibble(x = c(2, 3), y = c(2, NA)), join_by = "x", sallow_join = TRUE),
+ tibble(x = c(1, 2, 3), y = c(2, 2, ""))
+ )
+
+ expect_equal(
+ fj(tibble(x = c(1, 2), y = 2), tibble(x = c(2, 3), y = 3), join_by = "x", sallow_join = TRUE, sep = "+"),
+ tibble(x = c(1, 2, 3), y = c("2+", "2+3", "+3"))
+ )
+
dc0 <- readRDS("testdata/enron_from_unpivotr_processed.rds") %>%
analyze_cells()
- expect_warning(dc00 <- dc0 %>%
- compose_cells_raw(post_process = FALSE, ask_user = FALSE), "failed to compose")
+ expect_warning(
+ dc00 <- dc0 %>%
+ compose_cells_raw(post_process = FALSE, ask_user = FALSE),
+ "failed to compose"
+ )
dc01 <- dc00 %>%
collate_columns(combine_threshold = 0.1)
diff --git a/tests/testthat/test-read_cells.R b/tests/testthat/test-read_cells.R
index 230f022..fdb1a17 100644
--- a/tests/testthat/test-read_cells.R
+++ b/tests/testthat/test-read_cells.R
@@ -89,6 +89,11 @@ test_that("read_cells: (for csv) chains works", {
lvlsdchk <- lvlsd[1:6] %>% purrr::map(read_cells)
lvldchk <- lvld %>% purrr::map(read_cells)
+ expect_identical(
+ read_cells(lvlsd[[5]], from_level = 4),
+ read_cells(lvlsd[[5]])
+ )
+
expect_error(read_cells(lvlsd[[7]]), "No 'read_cells_stage' attribute found!")
expect_true(lvlsdchk %>% purrr::map_lgl(~ identical(.x, lvlsdchk[[1]])) %>% all())
expect_true(lvldchk %>% purrr::map_lgl(~ identical(.x, lvldchk[[1]])) %>% all())
diff --git a/vignettes/ext/compose_cells_cli1.png b/vignettes/ext/compose_cells_cli1.png
index aa06356..f18c652 100644
Binary files a/vignettes/ext/compose_cells_cli1.png and b/vignettes/ext/compose_cells_cli1.png differ
diff --git a/vignettes/tidycells-intro.Rmd b/vignettes/tidycells-intro.Rmd
index 856f877..aefade0 100644
--- a/vignettes/tidycells-intro.Rmd
+++ b/vignettes/tidycells-intro.Rmd
@@ -395,7 +395,7 @@ dcfine <- dc %>%
knitr::kable(head(dcfine), align = c(rep("l", 3), "c"))
```
-This is still not good right! You had to manually pick some weird column-names and spent some brain (when it was evident from data which columns should be aligned with whom).
+This is still not good right! You had to manually pick some weird column-names (when it was evident from data which columns should be aligned with whom).
The `collate_columns` functions does exactly this for you. So instead of manually picking column-names after compose cells you can simply run
@@ -407,7 +407,7 @@ collate_columns(dc) %>%
knitr::kable(head(collate_columns(dc)), align = c(rep("l", 5), "c"))
```
-Looks like staged example! Yes you are right this is not always perfect (same is true for `analyze_cells` also). However, if the data is somehow helpful in demystifying underlying columns structure (like this one), then this will be useful.
+Looks like staged example! Yes you are right, this is not always perfect (same is true for `analyze_cells` also). However, if the data is somehow helpful in demystifying underlying columns structure (like this one), then this will be useful.
Once again, these functions `read_cells` (all functionalities combined), `analyze_cells`, `collate_columns` are here to ease your pain in data wrangling and reading from various sources. It may not be full-proof solution to all types of tabular data. It is always recommended to perform these tasks manually whenever expected results are not coming.