Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cicd: add basic precommit hooks #454

Merged
merged 19 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion .github/workflows/python-cqa.yml
korikuzma marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
- uses: actions/cache@v4
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('pyproject.toml') }}
key: ${{ runner.os }}-pip-test-${{ hashFiles('pyproject.toml') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -40,3 +40,32 @@ jobs:
- name: Test with pytest
run: |
python -m pytest

precommit_hooks:
runs-on: ubuntu-latest
strategy:
matrix:
hook:
- cmd: "end-of-file-fixer"
- cmd: "trailing-whitespace"
- cmd: "mixed-line-ending"
steps:
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-precommit-${{ hashFiles('pyproject.toml') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Update pip and setuptools
run: |
python -m pip install --upgrade pip setuptools
- name: Install dependencies
run: |
pip install -e .[dev]
- name: Run pre-commit ${{ matrix.hook.cmd }} hook
run: pre-commit run ${{ matrix.hook.cmd }} --all-files
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ lint
uta_*.pgd.gz
.vscode
*.log
.idea
.idea
14 changes: 14 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-added-large-files
- id: detect-private-key
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-merge-conflict
- id: detect-aws-credentials
args: [ --allow-missing-credentials ]
- id: mixed-line-ending
args: [ --fix=lf ]
minimum_pre_commit_version: 4.0.1
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ venv/%:
#=> develop: install package in develop mode
.PHONY: develop setup
develop setup:
pip install -e .[dev,extras,notebooks]
pip install -e .[dev,extras,notebooks]; \
pre-commit install

#=> devready: create venv, install prerequisites, install pkg in develop mode
.PHONY: devready
Expand Down
2 changes: 1 addition & 1 deletion docs/setup_help/m1_mac_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,4 @@ export PKG_CONFIG_PATH="/opt/homebrew/opt/openssl@1.1/lib/pkgconfig:/opt/homebre
14. Run the make devready command:
1. `make devready`
15. Run the make test command:
1. `make test`
1. `make test`
6 changes: 3 additions & 3 deletions docs/setup_help/uta_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@
4. Create roles for the application, give login and CREATEDB permissions:
1. `CREATE ROLE uta_admin WITH LOGIN CREATEDB;`
2. `CREATE ROLE anonymous WITH LOGIN CREATEDB;`
5. List the users using the command
5. List the users using the command
1. `\du`
6. Create the UTA Database object
1. `CREATE DATABASE uta;`
7. Grant privileges to manage the database to uta_admin
1. `GRANT ALL PRIVILEGES ON DATABASE uta TO uta_admin;`
8. Exit postgres
1. `\q`
9. Download the UTA database and place it in the uta database object that you created before (**This step takes around 5 hours**).
9. Download the UTA database and place it in the uta database object that you created before (**This step takes around 5 hours**).
1. `export UTA_VERSION=uta_20210129.pgd.gz\ncurl -O http://dl.biocommons.org/uta/$UTA_VERSION\ngzip -cdq ${UTA_VERSION} | psql -h localhost -U uta_admin --echo-errors --single-transaction -v ON_ERROR_STOP=1 -d uta -p 5432`
10. Set your UTA path
1. `export UTA_DB_URL=postgresql://uta_admin@localhost:5432/uta/uta_20210129`
Expand All @@ -30,4 +30,4 @@ gzip -cdq ${UTA_VERSION} | grep -v "^REFRESH MATERIALIZED VIEW" | psql -h localh
1. `REFRESH MATERIALIZED VIEW uta_20210129.exon_set_exons_fp_mv;`
2. `REFRESH MATERIALIZED VIEW uta_20210129.tx_exon_set_summary_mv;`
3. `REFRESH MATERIALIZED VIEW uta_20210129.tx_def_summary_mv;`
4. `REFRESH MATERIALIZED VIEW uta_20210129.tx_similarity_mv;` #**This step will take 5 or more hours**
4. `REFRESH MATERIALIZED VIEW uta_20210129.tx_similarity_mv;` #**This step will take 5 or more hours**
5 changes: 0 additions & 5 deletions notebooks/getting_started/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,3 @@ we show how to transform basic variants to VRS, and in some cases, back to the o
The final notebook of this series,
[Exploring the CNV Translator](5_Exploring_the_CnvTranslator.ipynb) details transformations
of various forms of copy number variation to their VRS representations.





1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ dev = [
"vcrpy",
"pyyaml",
# style
"pre-commit>=4.0.1",
"pylint",
"yapf",
# docs
Expand Down
4 changes: 2 additions & 2 deletions src/ga4gh/vrs/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -703,7 +703,7 @@ class ga4gh(_ValueObject.ga4gh):
'component',
'orientation',
'type'
]
]

class DerivativeMolecule(_VariationBase):
"""The "Derivative Molecule" data class is a structure for describing a derivate
Expand All @@ -716,7 +716,7 @@ class DerivativeMolecule(_VariationBase):
description="The traversal block components that make up the derivative molecule.",
min_length=2
)
circular: Optional[bool] = Field(None, description="A flag indicating if the derivative molecule is circular (true) or linear (false).")
circular: Optional[bool] = Field(None, description="A flag indicating if the derivative molecule is circular (true) or linear (false).")

class ga4gh(_Ga4ghIdentifiableObject.ga4gh):
prefix = "DM"
Expand Down
35 changes: 17 additions & 18 deletions src/ga4gh/vrs/utils/hgvs_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def __init__(self,data_proxy: _DataProxy = None):
self.variant_mapper = hgvs.variantmapper.VariantMapper(self.uta_conn)
self.data_proxy = data_proxy



def close(self):
# TODO These should only be closed if they are owned by this instance
Expand All @@ -57,7 +57,7 @@ def parse(self, hgvs_str):
if not self.hgvs_re.match(hgvs_str):
return None
return self.parser.parse_hgvs_variant(hgvs_str)

def is_intronic(self, sv: hgvs.sequencevariant.SequenceVariant):
"""
Checks if the given SequenceVariant is intronic.
Expand All @@ -71,13 +71,13 @@ def is_intronic(self, sv: hgvs.sequencevariant.SequenceVariant):
if isinstance(sv.posedit.pos, hgvs.location.BaseOffsetInterval):
return (sv.posedit.pos.start.is_intronic or sv.posedit.pos.end.is_intronic)
return False


def get_edit_type(self, sv: hgvs.sequencevariant.SequenceVariant):
if sv is None or sv.posedit is None or sv.posedit.edit is None:
return None
return sv.posedit.edit.type

def get_position_and_state(self, sv: hgvs.sequencevariant.SequenceVariant):
"""
Get the details of a sequence variant.
Expand Down Expand Up @@ -150,14 +150,14 @@ def extract_allele_values(self, hgvs_expr: str):
sv = self.parse(hgvs_expr)
if not sv:
return None

if self.is_intronic(sv):
raise ValueError("Intronic HGVS variants are not supported")

refget_accession = self.data_proxy.derive_refget_accession(sv.ac)
if not refget_accession:
return None

# translate coding coordinates to positional coordinates, if necessary
if sv.type == "c":
sv = self.c_to_n(sv)
Expand All @@ -166,7 +166,7 @@ def extract_allele_values(self, hgvs_expr: str):

retval = {"refget_accession": refget_accession, "start": start, "end": end, "literal_sequence": state}
return retval

def from_allele(self, vo, namespace=None):
"""generates a *list* of HGVS expressions for VRS Allele.
Expand All @@ -188,15 +188,15 @@ def from_allele(self, vo, namespace=None):

if vo is None:
return []
if not isinstance(vo, models.Allele):
if not isinstance(vo, models.Allele):
raise ValueError("VRS object must be an Allele")
if vo.location is None:
raise ValueError("VRS allele must have a location")

refget_accession = vo.location.get_refget_accession()
if refget_accession is None:
raise ValueError("VRS allele location must have a sequence reference")

sequence = f"ga4gh:{refget_accession}"
aliases = self.data_proxy.translate_sequence_identifier(sequence, namespace)

Expand All @@ -218,9 +218,9 @@ def from_allele(self, vo, namespace=None):
# create the hgvs expression object
var = self._to_sequence_variant(vo, sequence_type, sequence, accession)
hgvs_exprs += [str(var)]

return list(set(hgvs_exprs))

def _to_sequence_variant(self, vo, sequence_type, sequence, accession):
"""Creates a SequenceVariant object from an Allele object."""
# build interval and edit depending on sequence type
Expand Down Expand Up @@ -253,24 +253,24 @@ def _to_sequence_variant(self, vo, sequence_type, sequence, accession):
# this will subsequently be converted back to `c.` after hgvs normalization
type='n' if sequence_type == 'c' else sequence_type,
posedit=posedit)

try:
# if the namespace is GRC, can't normalize, since hgvs can't deal with it
parsed = self.parse(str(var))
var = self.normalize(parsed)

# if sequence_type is coding, convert from "n." to "c." before continuing
if sequence_type == "c":
var = self.n_to_c(var)

except hgvs.exceptions.HGVSDataNotAvailableError:
_logger.warning(f"No data found for accession {accession}")

return var

def normalize(self, hgvs):
return self.normalizer.normalize(hgvs)

def n_to_c(self, hgvs):
return self.variant_mapper.n_to_c(hgvs)

Expand Down Expand Up @@ -312,4 +312,3 @@ def c_to_n(self, hgvs):
hgvs_expr = hgvsTools.from_allele(vrs_allele, namespace="refseq")

print(hgvs_expr)

1 change: 0 additions & 1 deletion submodules/README.md
korikuzma marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
This directory contains submodules required by vrs-python. If you
don't see a vrs directory here, please reread instructions at
https://github.com/ga4gh/vrs-python#installing-for-development.

2 changes: 1 addition & 1 deletion tests/extras/data/test_vcf_input.vcf
Original file line number Diff line number Diff line change
Expand Up @@ -235,4 +235,4 @@ chr19 28946400 . T C 50 PASS platforms=5;platformnames=Illumina,PacBio,CG,10X,So
chr19 490414 . ACT A 50 PASS platforms=5;platformnames=Illumina,PacBio,CG,10X,Solid;datasets=5;datasetnames=HiSeqPE300x,CCS15kb_20kb,CGnormal,10XChromiumLR,SolidSE75bp;callsets=7;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,CCS15kb_20kbGATK4,CGnormal,HiSeqPE300xfreebayes,10XLRGATK,SolidSE75GATKHC;datasetsmissingcall=IonExome;callable=CS_HiSeqPE300xGATK_callable,CS_CCS15kb_20kbDV_callable,CS_CCS15kb_20kbGATK4_callable,CS_CGnormal_callable,CS_HiSeqPE300xfreebayes_callable;filt=CS_10XLRGATK_filt GT:PS:DP:ADALL:AD:GQ 0/1:.:821:163,158:239,220:1004
chr19 54220024 . G *,A 50 PASS platforms=1;platformnames=PacBio;datasets=1;datasetnames=CCS15kb_20kb;callsets=1;callsetnames=CCS15kb_20kbGATK4;datasetsmissingcall=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,CGnormal,IonExome,SolidSE75bp;callable=CS_CCS15kb_20kbGATK4_callable;filt=CS_CCS15kb_20kbDV_filt,CS_10XLRGATK_filt,CS_HiSeqPE300xfreebayes_filt;difficultregion=HG001.hg38.300x.bam.bilkentuniv.010920.dups,hg38.segdups_sorted_merged GT:PS:DP:ADALL:AD:GQ 1/2:.:45:0,20,25:0,20,25:99
chr19 54220999 . A T 50 PASS platforms=1;platformnames=PacBio;datasets=1;datasetnames=CCS15kb_20kb;callsets=1;callsetnames=CCS15kb_20kbGATK4;datasetsmissingcall=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,CGnormal,IonExome,SolidSE75bp;callable=CS_CCS15kb_20kbGATK4_callable;filt=CS_CCS15kb_20kbDV_filt,CS_10XLRGATK_filt,CS_HiSeqPE300xfreebayes_filt;difficultregion=HG001.hg38.300x.bam.bilkentuniv.010920.dups,hg38.segdups_sorted_merged GT:PS:DP:ADALL:AD:GQ 0/1:.:45:0,20,25:0,20,25:99
chr19 54221654 . T A,P 50 PASS platforms=1;platformnames=PacBio;datasets=1;datasetnames=CCS15kb_20kb;callsets=1;callsetnames=CCS15kb_20kbGATK4;datasetsmissingcall=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,CGnormal,IonExome,SolidSE75bp;callable=CS_CCS15kb_20kbGATK4_callable;filt=CS_CCS15kb_20kbDV_filt,CS_10XLRGATK_filt,CS_HiSeqPE300xfreebayes_filt;difficultregion=HG001.hg38.300x.bam.bilkentuniv.010920.dups,hg38.segdups_sorted_merged GT:PS:DP:ADALL:AD:GQ 0/1:.:45:0,20,25:0,20,25:99
chr19 54221654 . T A,P 50 PASS platforms=1;platformnames=PacBio;datasets=1;datasetnames=CCS15kb_20kb;callsets=1;callsetnames=CCS15kb_20kbGATK4;datasetsmissingcall=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,CGnormal,IonExome,SolidSE75bp;callable=CS_CCS15kb_20kbGATK4_callable;filt=CS_CCS15kb_20kbDV_filt,CS_10XLRGATK_filt,CS_HiSeqPE300xfreebayes_filt;difficultregion=HG001.hg38.300x.bam.bilkentuniv.010920.dups,hg38.segdups_sorted_merged GT:PS:DP:ADALL:AD:GQ 0/1:.:45:0,20,25:0,20,25:99