Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the IRanges() constructor compatible with ALTREP #18

Open
hpages opened this issue Aug 22, 2019 · 4 comments
Open

Make the IRanges() constructor compatible with ALTREP #18

hpages opened this issue Aug 22, 2019 · 4 comments
Assignees

Comments

@hpages
Copy link
Contributor

hpages commented Aug 22, 2019

(Moving this from the devteam-bioc Slack to GitHub)

Jiefei wrote on Slack:

Hi Herve, I would like to make some changes to the IRanges package but I might miss something so I want to hear your opinion. For making the IRanges package compatible with ALTREP, the IRanges should be able to use the user's input as its start and width variable, so for the C level constructor:

SEXP solve_user_SEW0(SEXP start, SEXP end, SEXP width)
{
    SEXP ans, ans_start, ans_width;
    int ans_length, i;

    ans_length = LENGTH(start);
    PROTECT(ans_start = NEW_INTEGER(ans_length));
    PROTECT(ans_width = NEW_INTEGER(ans_length));
    for (i = 0; i < ans_length; i++) {
        if (solve_user_SEW0_row(INTEGER(start)[i],
                    INTEGER(end)[i],
                    INTEGER(width)[i],
                    INTEGER(ans_start) + i,
                    INTEGER(ans_width) + i) != 0)
        {
            UNPROTECT(2);
            error("solving row %d: %s", i + 1, errmsg_buf);
        }
    }
    PROTECT(ans = _new_IRanges("IRanges", ans_start, ans_width, R_NilValue));
    UNPROTECT(3);
    return ans;
}

I plan to change it to

SEXP solve_user_SEW0(SEXP start, SEXP end, SEXP width)
{
    SEXP ans;
    int duplicate_num = 0;
    int ans_length = LENGTH(start);
    for (int i = 0; i < ans_length; i++) {
        if (solve_user_SEW0_row(i, 
            &start,&end, &width,
            &duplicate_num
        ) != 0)
        {
            error("solving row %d: %s", i + 1, errmsg_buf);
        }
    }
    PROTECT(ans = _new_IRanges("IRanges", start, width, R_NilValue));
    UNPROTECT(1+ duplicate_num);
    return ans;
}

The function solve_user_SEW0_row will duplicate start and width if it need to change the value of these two variables. The similar pattern is also found in the function solve_user_SEW. I would like to know if there is any concern that prevents us from using the user's input as the IRanges object's slots. Do you think these changes could be a solution for the ALTREP problem?

@hpages hpages self-assigned this Aug 22, 2019
@hpages
Copy link
Contributor Author

hpages commented Aug 22, 2019

Sure. Not duplicating the user input when it doesn't need to be modified sounds like a good general principle anyway, even without ALTREP in the picture. I will suggest a slightly different implementation though. I'm not a big fan of the duplicate_num thing. I found it cleaner and much easier to maintain in the long run to do the UNPROTECTs in the function body that did the PROTECTs. In other words, I've always tried to have each function clean-up after itself rather than delegating the cleaning to the caller. I have a meeting in 20s (at 3pm EST) but will propose something after that.

@hpages
Copy link
Contributor Author

hpages commented Aug 23, 2019

@Jiefei-Wang

Hi Jiefei,

This is done in IRanges 2.19.12: 6eb40e6

Note the two-pass approach. The idea is to do a first pass in order to:

  • Determine whether the supplied start and width contain NAs. If they do, they can't be used as-is.
  • Make sure that all the supplied ranges are valid. This is better done before allocating the new start and/or width vectors (better to fail quickly rather than after a possibly costly memory allocation).

With this version of IRanges:

library(IRanges)
library(XVector)  # for XVector:::address()

start <- 101:105
width <- 7:3
ir1 <- IRanges(start, width=width)
stopifnot(XVector:::address(start(ir1)) == XVector:::address(start))
stopifnot(XVector:::address(width(ir1)) == XVector:::address(width))

width <- c(7:5, NA, NA)
ir2 <- IRanges(start, width=width)
# Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
#   In range 4: at least two out of 'start', 'end', and 'width', must
#   be supplied.

end <- c(NA, NA, NA, 999, 104)
ir3 <- IRanges(start, end, width)
stopifnot(XVector:::address(start(ir3)) == XVector:::address(start))
stopifnot(XVector:::address(width(ir3)) != XVector:::address(width))

start <- c(101:104, NA)
width <- 7:3
end <- c(NA, NA, NA, NA, 10)
ir4 <- IRanges(start, end, width)
stopifnot(XVector:::address(start(ir4)) != XVector:::address(start))
stopifnot(XVector:::address(width(ir4)) == XVector:::address(width))

Let me know how that works in the ALTREP context.

H.

@Jiefei-Wang
Copy link

Hi @hpages ,

From our discussion last week, we can make use of ALTREP if the code can operate on SEXPs instead of data pointers. Would you like me to apply these changes and make a pull request? I will preserve the code structure as possible as I can.

Best,
Jiefei

@hpages
Copy link
Contributor Author

hpages commented Aug 27, 2019

Sounds good Jiefei. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants