Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when trying to assign value to feature['score'] #83

Open
alanlhutchison opened this issue Apr 18, 2013 · 1 comment

Comments

@alanlhutchison
Copy link

Hi,

I took the code used in a previous thread: #82 and now wish to modify it so that I can assign a value to the previously empty 'score' value. I have some datases off of GEO that have a number in the 4th column which I believe corresponds to the score as opposed to the name of the feature. I have reflected this format in creating "b_test.bed" in the code below. I would like to take that 'feature.name' and make it the 'feature.score', and then rename the 'feature.name' as I was doing before in the previous issue I alluded to.

I have created a function 'is_number()' to determine if the given string is a number.

When I run this code I receive a segmentation fault, which is also what occurs if I simply try to set feature.score equal to a numerical string (such as '1').

Do you have any insight in to this problem? Your time and effort are appreciated.

import pybedtools

# Create two example files so that this example is self-contained.                                                                                                                                                                         
pybedtools.BedTool('''                                                                                                                                                                                                                     
chr2L 20 30 TestD1 23.23                                                                                                                                                                                                                   
chr2L 45 60 TestD2 24.24''', from_string=True).saveas('a_test.bed')

pybedtools.BedTool('''                                                                                                                                                                                                                     
chr3L 500 600 11.11                                                                                                                                                                                                                        
chr3L 900 1000 12.12''', from_string=True).saveas('b_test.bed')


def gen_change_name(fn, GSM):
    '''                                                                                                                                                                                                                                    
    This generator accepts a filename and a string, and yields                                                                                                                                                                             
    pybedtools.Interval features with names changed according to `GSM` and line                                                                                                                                                            
    number.                                                                                                                                                                                                                                
    '''
    for i, feature in enumerate(pybedtools.BedTool(fn)):
        print feature.name
        print feature.score
        if is_number(feature.name):
            if feature.score =='':
                feature.score = feature.name
        print feature.score
        feature.name = GSM + '_{0}'.format(i + 1)
        yield feature


def change_name(fn):
    '''                                                                                                                                                                                                                                    
    This function accepts a filename and creates a new file with changed names.                                                                                                                                                            
    It returns a BedTool pointing to this new file.                                                                                                                                                                                        
    '''
    GSM = fn.split('_')[0]

    # This is the key: BedTool objects can be created from a generator of                                                                                                                                                                  
    # pybedtools.Interval objects...which is what gen_name_change was designed                                                                                                                                                             
    # to do.                                                                                                                                                                                                                               
    return pybedtools.BedTool(gen_change_name(fn, GSM))\
        .saveas(fn + '.changed')


def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False


for fn in ('a_test.bed', 'b_test.bed'):
    original = pybedtools.BedTool(fn)
    changed = change_name(fn)
    print 'original', fn
    print original
    print 'changed', fn
    print changed

@daler
Copy link
Owner

daler commented Apr 18, 2013

Thanks for pointing this out. This happens because the example BED file (saved as b_test.bed) is BED4 format, and pybedtools allocates memory for a 4-item list. One solution is to use pybedtools.featurefuncs.extend_fields to pad it out to 5 fields, and then it should work as expected. This example demonstrates:

import pybedtools

b = pybedtools.BedTool('''
chr3L 500 600 11.11
chr3L 900 1000 12.12''', from_string=True).saveas('b_test.bed')

# Just get one interval to work with...
i = b[0]

# The following results in a segfault -- `i` does not have enough fields allocated
# because the format is BED4:
# i.score = i.name


# Solution: extend to BED5
from pybedtools.featurefuncs import extend_fields
i = extend_fields(i, 5)
i.score = i.name
print i

In general for the use-case you describe, try:

def name2score(f):
    f = extend_fields(f, 5)
    f.score, f.name = f.name, '.'
    return f

print b.each(name2score)
# chr3L   500   600   .   11.11
# chr3L   900   1000  .   12.12

I'll play around with the cbedtools.Interval.score setter method to get it to allocate enough room in the list (same with name and strand setters) so that things work as expected without the extend_features workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants