Optionally apply merged cells across their range #40

nacnudus · 2018-07-31T12:43:27Z

E.g. if a cell with value "foo" is merged across A1:A2 then cell A2 should be created/overridden as a clone.

The text was updated successfully, but these errors were encountered:

dmcalli2 · 2018-10-18T16:22:25Z

On this topic, I was wondering what the default behavior was with tidyxl when it encounters merged cells.
If, for example, cells A1:B2 is a single merged cell containing the value "some_value", does this translate to the following?

row	col	value
1	1	"some_value"
1	2	NA
2	1	NA
2	2	NA

If so, I think the enhead function in https://github.com/nacnudus/unpivotr will handle this problem fairly easily.

nacnudus · 2018-10-18T17:12:56Z

That's correct, and is the reason why I haven't thought of a need for this yet. It might still be useful to know when a cell is part of a merge.

ianmoran11 · 2019-03-30T06:05:01Z

This would be super useful for tidyABS. At the moment, I'm inferring whether cells are merged based on either style or local_format_id (see here). Whether style or local_format_id work depends on the spreadsheet. A more reliable approach to unmerging would be great.

nacnudus · 2019-03-30T21:15:57Z

@ianmoran11 thanks for commenting here. I've skimmed ianmoran11/tidyABS@d34fd3f to try to understand the use of unmerging cells, but I don't understand yet. Is it part of guessing which cells are headers? If so, would it be enough to know that a cell is merged? That would be much easier to implement than the next step, which is to clone that cell across the merged range.

ianmoran11 · 2019-04-07T12:35:58Z

Sorry, I missed your reply. Unmerging column headers, in ABS tables, increases the chance that N gives the correct relationship between the table data and the header (and this seems to be true of many other tables too). Without unmerging the column header, tidyABS would need to guess between NNW or N for each column header group, which turns out to be difficult.

Knowing whether the cell is merged would allow me to know with more certainty whether I should replace a cell's value with the value of the cell to the left.

I'm hope that explanation is clear - happy to provide a reprex to clarify.

nacnudus · 2019-04-07T14:48:02Z

I see. Could tidyABS use NNW every time? N is a convenient way to filter out cells that are outside the table (they don't have any headers), but otherwise it makes no difference. If tidyABS has another way to drop cells outside the table, then I would recommend this approach because it generalises to header cells that haven't been merged.

ianmoran11 · 2019-04-08T09:47:16Z

Using NNW every time was my initial approach, but it runs into trouble with some tables. See, for example, the table below.

By the way, I don't have a good understanding of how much work would be involved in adding this feature, and tidyABS gets by most of the time with its current approach to unmerging and filling cells, so this is far from an urgent request - just a nice-to-have.

nacnudus · 2019-04-08T10:54:38Z

Excellent example. I'm persuaded.

I had a go at this last night and it's just fiddly, not actually difficult.

bedantaguru · 2019-06-11T11:51:34Z

Alternatively, two additional columns can be added
is_merged_cell, merged_cell_id

is_merged_cell should be TRUE for merged cells

The merged_cell_id can be given sequentially (as per reading sequence maintained in xlsx_cells) over entire xlsx file. For each cell in a single merged cell (which is a set of cells) should receive the same id and in case of non merged cell that can be simply NA.

dmcalli2 · 2019-06-11T12:15:27Z

Just wanted to say that this would be helpful for my usage too. @rpsoft I am using tidyxl to re-arrange tables from published papers into a tidy format.
The suggestion above is great,

Alternatively, two additional columns can be added
is_merged_cell, merged_cell_id

is_merged_cell should be TRUE for merged cells

The merged_cell_id can be given sequentially (as per reading sequence maintained in xlsx_cells) over entire xlsx file. For each cell in a single merged cell (which is a set of cells) should receive the same id and in case of non merged cell that can be simply NA.

However, I wonder if merged_cell_id should not be NA, but rather should be the address for the top left cell, and in the case of single cells, this is of course just the cell reference.
This would fit in well wiht the tidyverse, as it would make it easier to do grouped operations on merged cells, eg

## Repeat the character string across the range of merged cells
mydf %>%
  arrange(row, cell) %>%
  group_by(merged_cell_id) %>%
  mutate(character = character[1]) %>%
  ungroup()

nacnudus · 2019-06-11T21:15:03Z

Thanks @bedantaguru and @dmcalli2 for the suggestions.

What do you think about:

not having an is_merged column, and
having both merged_cell_sheet and merged_cell_address columns to store the full address of the top-left cell in the merge?

dmcalli2 · 2019-06-11T21:36:55Z

Sounds great. Do you think that it would be good to set the value of merged_cell_address to the cell value when it is not a merged cell so it can be grouped-on in a single step? Or is that counter-intuitive for users?
I suppose it would only require an extra step if merged_cell_address is set to NA for non-merged cells.

mydf %>%
  arrange(row, cell) %>%
  mutate(merge_on= if_else(is.na(merged_cell_address), cell_address, merged_cell_address) %>% 
  group_by(merge_on) %>%
  mutate(character = character[1]) %>%
  ungroup()

nacnudus · 2019-06-11T21:39:49Z

Oh yes, I meant to ask what value you would expect merged cells to have (other than the top-left). I had assumed they ought to be pre-populated with the value of the top-left cell, in which case your example wouldn't be necessary.

dmcalli2 · 2019-06-12T00:58:23Z

I think you are right.

bedantaguru · 2019-06-13T10:37:45Z

Thanks @bedantaguru and @dmcalli2 for the suggestions.

What do you think about:

not having an is_merged column, and

having both merged_cell_sheet and merged_cell_address columns to store the full address of the top-left cell in the merge?

@nacnudus agree to not have is_merged but in that case, why do we need two columns.
I'm sorry but I did not get why we need merged_cell_sheet (sheet column is already there)(can a merged cell spread across sheets ? if so then fine.)

I guess as you suggested merged_cell_address makes sense. (then you don't need to do it optionally; in xlsx_cells you can have it by default)

In fact, your initial suggestion is also perfect (to optionally populate same fields for all cells in the merge). But in this case, you need an optional argument in xlsx_cells.

If the operation is not costly then you may choose the first one otherwise go for the second.

Either way, it looks perfectly fine.

bedantaguru · 2019-06-13T11:04:45Z

I can see the xlsx contains information like these :
<mergeCells count="1"><mergeCell ref="A1:B1"/></mergeCells>
for merged cells.

dmcalli2 · 2019-06-25T10:25:14Z

Is the desired behaviour agreed now (see below) and if so can we (myself and @rpsoft) help with the implementation, as this feature would be very useful for our current project?

Desired behaviour

Repeat top left value across all merged cells. Do so as the default with the option to suppress this behaviour xlsx_cells(repeat_across_merged = TRUE) .

nacnudus · 2019-06-25T22:13:15Z

@bedantaguru you are right, merged_cell_sheet is not necessary.

@dmcalli2 and @rpsoft thank you for offering to help with the implementation, which will be as follows (suggestions still accepted though!):

A new column merged_cell_address in the data frame returned by xlsx_cells(). It will be NA for cells that are not merged, otherwise it will be the address of the top-left cell in the merge. The top-left cell will have its own address in the merged_cell_address column. This will need to be in the C++ code.
Functions unmerge_cells(cells) and merge_cells(cells). They will need to cope when cells from different sheets are in the same data frame, and they should do something sensible when the cells to be merged or unmerged don't exist (i.e. the rectangle isn't complete, especially if the top-left cell is missing). These should be fine in R code, the speed should be fine.

I haven't checked whether the formatting of a merged cell is the same as the top-left cell in the merge. Hopefully it is, otherwise unmerging will be a Grade A pain.

Sorry for being slow to develop this. I don't have much time to spend on tidyxl at the moment.

dmcalli2 · 2019-06-26T13:00:29Z

Great. I'll fork the repo, talk with @rpsoft and make a plan.

bedantaguru · 2022-06-28T16:45:00Z

How are you all guys..

Have you given it a try...

miljukov-o · 2023-06-14T20:45:01Z

Hi, also desperately needing this feature...

nacnudus added the enhancement label Jul 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally apply merged cells across their range #40

Optionally apply merged cells across their range #40

nacnudus commented Jul 31, 2018

dmcalli2 commented Oct 18, 2018

nacnudus commented Oct 18, 2018

ianmoran11 commented Mar 30, 2019 •

edited

Loading

nacnudus commented Mar 30, 2019

ianmoran11 commented Apr 7, 2019

nacnudus commented Apr 7, 2019

ianmoran11 commented Apr 8, 2019

nacnudus commented Apr 8, 2019

bedantaguru commented Jun 11, 2019

dmcalli2 commented Jun 11, 2019

nacnudus commented Jun 11, 2019

dmcalli2 commented Jun 11, 2019

nacnudus commented Jun 11, 2019

dmcalli2 commented Jun 12, 2019

bedantaguru commented Jun 13, 2019

bedantaguru commented Jun 13, 2019

dmcalli2 commented Jun 25, 2019

nacnudus commented Jun 25, 2019

dmcalli2 commented Jun 26, 2019

bedantaguru commented Jun 28, 2022

miljukov-o commented Jun 14, 2023

Optionally apply merged cells across their range #40

Optionally apply merged cells across their range #40

Comments

nacnudus commented Jul 31, 2018

dmcalli2 commented Oct 18, 2018

nacnudus commented Oct 18, 2018

ianmoran11 commented Mar 30, 2019 • edited Loading

nacnudus commented Mar 30, 2019

ianmoran11 commented Apr 7, 2019

nacnudus commented Apr 7, 2019

ianmoran11 commented Apr 8, 2019

nacnudus commented Apr 8, 2019

bedantaguru commented Jun 11, 2019

dmcalli2 commented Jun 11, 2019

nacnudus commented Jun 11, 2019

dmcalli2 commented Jun 11, 2019

nacnudus commented Jun 11, 2019

dmcalli2 commented Jun 12, 2019

bedantaguru commented Jun 13, 2019

bedantaguru commented Jun 13, 2019

dmcalli2 commented Jun 25, 2019

Desired behaviour

nacnudus commented Jun 25, 2019

dmcalli2 commented Jun 26, 2019

bedantaguru commented Jun 28, 2022

miljukov-o commented Jun 14, 2023

ianmoran11 commented Mar 30, 2019 •

edited

Loading