Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when a new abstract has no ArticleDate and PubDate has no month, only year #1

Open
mjafin opened this issue Nov 15, 2023 · 2 comments

Comments

@mjafin
Copy link

mjafin commented Nov 15, 2023

Hi @suqingdong,
Thank you for this fantastic package, I'm finding it super useful for my research. I'm going through some newly released abstracts and am hitting an error:

ParserError: Unknown string format: 2023-None-1

When I traced this back to the code, it's coming from

pdat = util.check_date(Article.find('ArticleDate') if Article.find('ArticleDate') is not None else Article.find('Journal/JournalIssue/PubDate'))

and further

def check_date(element):
    year = element.findtext('Year')
    month = element.findtext('Month')
    day = element.findtext('Day') or '1'

    return parse_date(f'{year}-{month}-{day}')

The issue here is that the article (PMID 36911757) currently has no ArticleDate and PubDate only has year in it, so the month doesn't parse. Any thoughts on how to address this?

@suqingdong
Copy link
Owner

Hi @suqingdong, Thank you for this fantastic package, I'm finding it super useful for my research. I'm going through some newly released abstracts and am hitting an error:

ParserError: Unknown string format: 2023-None-1

When I traced this back to the code, it's coming from

pdat = util.check_date(Article.find('ArticleDate') if Article.find('ArticleDate') is not None else Article.find('Journal/JournalIssue/PubDate'))

and further

def check_date(element):
    year = element.findtext('Year')
    month = element.findtext('Month')
    day = element.findtext('Day') or '1'

    return parse_date(f'{year}-{month}-{day}')

The issue here is that the article (PMID 36911757) currently has no ArticleDate and PubDate only has year in it, so the month doesn't parse. Any thoughts on how to address this?

the bug has been fixed in version: v1.0.1

image

@mjafin
Copy link
Author

mjafin commented Nov 24, 2023

@suqingdong cheers for the prompt fix, much appreciated. There is another issue I identified in pubmed_xml/core/parser.py, namely if Article.find('Journal/ISSN') is not present, then Article.find('Journal/ISSN').attrib['IssnType'] will make the code error out. I made a dummy fix at:
master...mjafin:pubmed_xml:master#diff-a27d47d226e680c1e795927eefc42106838f5022f6fd56a814b6949f93547d07R62 using
Article.find('Journal/ISSN').attrib['IssnType'] if Article.find('Journal/ISSN') else 'NA'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants