Skip to content

Commit

Permalink
Merge branch 'master' of github.com:WGLab/PennCNV
Browse files Browse the repository at this point in the history
  • Loading branch information
kaichop committed Dec 24, 2016
2 parents f08555c + e4463d7 commit 08e6c03
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 11 deletions.
8 changes: 6 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,19 @@ PennCNV is a free software tool for Copy Number Variation (CNV) detection from S

PennCNV implements a hidden Markov model (HMM) that integrates multiple sources of information to infer CNV calls for individual genotyped samples. It differs form segmentation-based algorithm in that it considered SNP allelic ratio distribution as well as other factors, in addition to signal intensity alone. In addition, PennCNV can optionally utilize family information to generate family-based CNV calls by several different algorithms. Furthermore, PennCNV can generate CNV calls given a specific set of candidate CNV regions, through a validation-calling algorithm.

This website is built for the "original" Perl/C-based PennCNV developed for SNP arrays (see references below). Other tools of the PennCNV family include [PennCNV2](http://sourceforge.net/projects/penncnv-2/) (C++ based PennCNV for tumor/NGS data) and PennCNV3 (Hadoop-based PennCNV for NGS data).
This website is built for the "original" Perl/C-based PennCNV developed for SNP arrays (see references below). Other tools of the PennCNV family include [PennCNV2](https://github.com/WGLab/PennCNV2/) (C++ based PennCNV for tumor/NGS data) and [PennCNV3](https://github.com/WGLab/HadoopCNV) (Java/Hadoop-based PennCNV for NGS data).

## What's new:

- 20161024: Prof. George Kirov shared a recently published paper reporting the use of PennCNV on Affy Axiom arrays [here](http://www.sciencedirect.com/science/article/pii/S0006322316327111). Detailed procedure for CNV calling is given in the supplementary materials.
- 20160805: PennCNV has been dockerized by Roman Hillje at the University of Zurich, Switzerland. The docker image and related documentation are available at https://hub.docker.com/r/romanhaa/penncnv/.

## Reference:

- Wang K, Li M, Hadley D, Liu R, Glessner J, Grant S, Hakonarson H, Bucan M. [PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data](http://genome.cshlp.org/cgi/content/short/17/11/1665) _**Genome Research**_ 17:1665-1674, 2007
- Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K. [Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms](http://nar.oxfordjournals.org/cgi/content/short/36/19/e126) **_Nucleic Acids Research_** 36:e126, 2008
- Wang K, Chen Z, Tadesse MG, Glessner J, Grant SFA, Hakonarson H, Bucan M, Li M. [Modeling genetic inheritance of copy number variations](http://nar.oxfordjournals.org/cgi/content/short/36/21/e138) _**Nucleic Acids Research**_ 36:e138, 2008


Click the menu to the left to navigate through the PennCNV website.


4 changes: 2 additions & 2 deletions docs/misc/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

1. **How to use genomic wave adjustment independent of CNV calling?**

Some users just want to adjust signal intensity values, without generating CNV calls by PennCNV. The genomic_wave.pl program in PennCNV package can be used to adjust signal intensity values. The input file must have a field in the header line that says "*.Log R Ratio". The -adjust argument can be used to generate a new file with updated Log R Ratio measures. This procedure can be also used in Agilent arrays or Nimblegen arrays for adjustment. Email me for a script to generate GC model file for these custom arrays.
Some users just want to adjust signal intensity values, without generating CNV calls by PennCNV. The genomic_wave.pl program in PennCNV package can be used to adjust signal intensity values. The input file must have a field in the header line that says "*.Log R Ratio". The -adjust argument can be used to generate a new file with updated Log R Ratio measures. This procedure can be also used in Agilent arrays or Nimblegen arrays for adjustment. Use the `cal_gc_snp.pl` script to generate GC model file for these custom arrays.

1. **A sample generates >1000 CNV calls, whats wrong?**

Expand Down Expand Up @@ -165,7 +165,7 @@ rs109702 16 6508957 BB -0.001872403 0.9723207
1. **How to identify a subset of the most confident de novo CNV calls?**
The 2009Aug27 vesion of PennCNV added a script for validating de novo CNVs and assigning P-values to de novo calls. If you want to know whether a particularly interesting de novo CNV is real or not, or if you want to select a set of most confident de novo CNVs for experimental validation, then this program should definitely be used. Check it out here.
The 2009Aug27 vesion of PennCNV added a script for validating de novo CNVs and assigning P-values to de novo calls. If you want to know whether a particularly interesting de novo CNV is real or not, or if you want to select a set of most confident de novo CNVs for experimental validation, then this program should definitely be used. Check it out [here](../user-guide/denovo.md).
1. **Can PennCNV give allele-specific CNV calls?**
Expand Down
39 changes: 37 additions & 2 deletions docs/user-guide/affy.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,20 @@ Follow the same procedure as 500K array, but download the specific cdf files fro
[kaiwang@cc ~/]$ apt-probeset-genotype -c CD_Mapping50K_Xba240_rev3/Full/Mapping50K_Xba240/LibFiles/Mapping50K_Xba240.CDF --chrX-snps CD_Mapping50K_Xba240_rev3/Full/Mapping50K_Xba240/LibFiles/Mapping50K_Xba240.chrx --out-dir apt_xba *.CEL
```

**Axiom array**

The instructions below were provided by Professor George Kirov at Cardiff University.

Using APT: `apt-probeset-genotype --analysis-files-path Axiom_UKB_WCSG.xml --out-dir Batch1 --summaries --cel-files list_CEL_files.txt` (The user has to choose the appropriate `Axiom_UKB_WCSG.xml` file that suits their analysis). Follow instructions provided by APT.

The command generates 4 output files:
* AxiomGT1.calls.txt
* AxiomGT1.confidences.txt (contains confidences for the genotype calls).
* AxiomGT1.report.txt (contains various summaries for the samples analyzed, including the computed gender, call rate and heterozygosity).
* AxiomGT1.summary.txt

The manuscript reporting the above procedure has been published [here](http://www.sciencedirect.com/science/article/pii/S0006322316327111). Detailed procedure for CNV calling on the Axiom array can be found in the supplementary materials in the published manuscript.

### - Subsetp 1.2 Allele-specific signal extraction from CEL files

This step uses the Affymetrix Power Tools software to extract allele-specific signal values from the raw CEL files. Here `allele-specific` refers to the fact that for each SNP, we have a signal measure for the A allele and a separate signal measure for the B allele.
Expand Down Expand Up @@ -158,15 +172,36 @@ Similar command as genome-wide arrays should be used for Nsp and Sty array separ

Same as above. Get the PFB file here. It functions both as a --locfile in the command line above, and as a --pfbfile in CNV calling later on.


**Axiom array**

Next, use PennCNV-Affy:

```
generate_affy_geno_cluster.pl AxiomGT1.calls.txt AxiomGT1.confidences.txt AxiomGT1.summary.txt --nopower2 -locfile mapfile.dat -sexfile sex_batch1.txt -out batch1.genocluster`
```

Please note the `-nopower2` argument above. The signal intensity values have not been log2 normalized, so the `-nopower2` argument is needed.

Then follow the PennCNV-Affy workflow.

### - Substep 1.4 LRR and BAF calculation

This step use the allele-specific signal intensity measures generated from the last step to calculate the Log R Ratio (LRR) values and the B Allele Frequency (BAF) values for each marker in each individual. The normalize_affy_geno_cluster.pl program in the downloaded PennCNV-Affy package (see gw6/bin/ directory) is used:
This step use the allele-specific signal intensity measures generated from the last step to calculate the Log R Ratio (LRR) values and the B Allele Frequency (BAF) values for each marker in each individual. The `normalize_affy_geno_cluster.pl` program in the downloaded PennCNV-Affy package (see gw6/bin/ directory) is used:

```
[kai@cc ~/]$ normalize_affy_geno_cluster.pl gw6.genocluster quant-norm.pm-only.med-polish.expr.summary.txt -locfile ../lib/affygw6.hg18.pfb -out gw6.lrr_baf.txt
```

The above command generates LRR and BAF values using the summary file generated in last step, and using a cluster file called gw6.genocluster generated in the last step. The location file specifies the chromosome position of each SNP or CN probe, and this information is printed in the output files as well to facilitate future data processing.
The above command generates LRR and BAF values using the summary file generated in last step, and using a cluster file called `gw6.genocluster` generated in the last step. The location file specifies the chromosome position of each SNP or CN probe, and this information is printed in the output files as well to facilitate future data processing.

For axiom array, this command is used:

```
normalize_affy_geno_cluster.pl batch1.genocluster AxiomGT1.summary.txt -nopower2 -locfile mapfileAX.dat -out batch1_lrr_baf.txt
```

Please note the `-nopower2` argument above. The signal intensity values have not been log2 normalized, so the `-nopower2` argument is needed.

For a typical modern computer, the command should take several hours to process files generated from 1000-2000 CEL files. A new tab-delimited file called gw6.lrr_baf.txt will be generated that contains one SNP per line and one sample per two columns (LRR column and BAF column).

Expand Down
12 changes: 8 additions & 4 deletions docs/user-guide/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,15 @@ Now we can open a Windows command shell (click the start, then click Run, then t

Users need to recompile the source code for execution in Mac OS. This information was provided by Markus Ringner: In the Makefile, replace `-shared` with `-dynamiclib`, replace `khmm.so` with `khmm.dylib`.

## Using Docker image for PennCNV

PennCNV has been dockerized by Roman Hillje at the University of Zurich, Switzerland. The docker image and related documentation are available at https://hub.docker.com/r/romanhaa/penncnv/. Please refer to the website for detailed instructions.

## Compilation from source

Unless you are using Windows, compilation from source is always recommended.

If using a very old machine with very old version of make program, the fancy characters like $@ and $^ in the Makefile may not be recognized correctly. If thats the case, just manually change the character to appropriate file names.
If using a very old machine with very old version of make program, the fancy characters like $@ and $^ in the Makefile may not be recognized correctly. If that is the case, just manually change the character to appropriate file names.

**Compilation in Cygwin**: change khmm.so to khmm.dll in the Makefile file before compilation.*

Expand Down Expand Up @@ -137,9 +141,9 @@ regtool -v list /HKLM/Software/Cygnus\ Solutions/Cygwin

**Symptom**: when typing "make" to compile the program, gcc complains that " /usr/bin/ld: cannot find -lperl"

**Solution**: The error is caused by the fact that perl is not installed in a standard way so the path to libperl is not annotated in the "perl -MExtUtils::Embed -e ldopts" command output.
If you run "perl -V", you can try to find something like"/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE" in the dynamic linking section. Now make sure that the"/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/libperl.so"
file actually exist. Then in the Makefile, just add"-L/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/" to the command it should fix the problem.
**Solution**: The error is caused by the fact that perl is not installed in a standard way so the path to libperl is not annotated in the "perl -MExtUtils::Embed -e ldopts" command output. (Sometimes this issue can be easily solved by `sudo apt-get install libperl-dev` in Ubuntu.)

If you run "perl -V", you can try to find something like"/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE" in the dynamic linking section. Now make sure that the"/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/libperl.so" file actually exist. Then in the Makefile, just add"-L/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/" to the command it should fix the problem.

To give a more clear example, suppose after using "perl -V", you know that the library "libperl.so" file is in the "/usr/lib" folder. But if you run "perl -MExtUtils::Embed -e ldopts" you found the following: " -Wl,-E -L/usr/local/lib -L/usr/lib/perl/5.10/CORE -lperl -ldl -lm -lpthread -lc -lcrypt", so /usr/lib is not annotated here.

Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/test.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Note: As of June 2008, the –medianadjust argument is turned on by default in t

As we can see from the log message, first the program reads the PFB file, and it discarded 178 records in chr M, Y and XY. Next the program reads signal information for autosomes from the `sample1.txt` file, and discarded 178 records in the signal file (The reason is that these records/SNPs is not read/annotated in the PFB file). Note that a total of 561288 markers are read into PennCNV for analysis: this sounds right for Illumina 550K arrays. For Affy GW6 or GW5 array, this number should be around 1.8 million or 800K; for Illumina 1M array, this number should be around 1M. If it's substantially lower than the expected number, then it means something is wrong so that the signal files are not read correctly. Examine the signal files to see how many lines (markers) it has, and whether the last line is complete, to check the possibility of file corruption or file incompletion during generation.

Next the program prints out a list of sample quality summary: this information is very useful step in quality control (see http://www.neurogenome.org/cnv/penncnv/qc_tutorial.htm for more detail). If some of the quality measure looks bad, a warning message will be printed out that the sample does not pass quality control criteria. The CNV calls will still be made for all samples regardless of quality, but users are advised to take caution in analyzing CNV calls from low-quality samples.
Next the program prints out a list of sample quality summary: this information is very useful step in quality control. If some of the quality measure looks bad, a warning message will be printed out that the sample does not pass quality control criteria. The CNV calls will still be made for all samples regardless of quality, but users are advised to take caution in analyzing CNV calls from low-quality samples.

## GCmodel adjustment

Expand Down

0 comments on commit 08e6c03

Please sign in to comment.