NEWS

Version 1.1.6

* Use 'mandir' variable to handle man page
  installation. This will be useful when
  packaging for distributions - like debian.

* Fix an english typo in usage.rst.

Version 1.1.5

* Now, by default, compile and install a man
  page alongside the sider executable when
  sphinx-build is present.

* Add CodeQL to test vulnerabilities in code.

* Add a security policy.

Version 1.1.4

* Fix an english typo in cluster.c (cluster).

* Fix readthedocs build.

* Update ubuntu image version for docker and
  github workflows. Use ubuntu 24.04.

Version 1.1.3

* Improve memory usage when reading unsorted,
  non-collated, alignment files. To solve
  running unsorted file, read it three time:
  Firstly index all abnormal read querynames
  into a hash table, then remove low-quality
  fragments and finally dump all indexed
  fragments to the database.

Version 1.1.2

* Fix a possible vulnerability in 'strncat'
  and 'strcpy' inside a loop, replacing them
  with 'memcpy'.

* Use multi-stage build to decrease the final
  docker image size.

Version 1.1.1

* Fix testing. Point the right 'libcheck'
  version (>= 0.15.0) in 'meson.build'.

Version 1.1.0

* Fix bug in 'merge-call' subcommand when
  merging databases with the option 'in-place'
  active.

* Fix testing to handle SIGABRT and SIGSEGV.

* Move from Travis-CI to Github Actions. Test
  the code, build and deploy to Dockerhub.

* Update Dockerfile base image to ubuntu
  version 20.04.

Version 1.0.0

* Now sideRETRO works with the CRAM file
  format too.

* Fix bug when reading GTF, possibly GFF3
  as well, and there is a space into the attr
  value: gene_name "My gene name". Now, the
  attr will be correctly splited in:
  key=gene_name and value='My gene name'.

* Add citation option. Print it in BibTeX
  format.

* Add 'gz' interface, in order to wrapper
  'libz' library and remove duplicated
  code.

* Add the copying notices header to each
  source script.

Version 0.14.1

* Add 'git' to Dockerfile. It will avoid
  errors when picking the right version.

* Fix the default value for 'phred-quality'.

* Fix building: 'config.h' is dynamically
  generated from template 'config.h.in', so
  it need to be made before compiling. In order
  to assure the required order, declare
  'config.h' as a dependency of library and
  executable.

Version 0.14.0

* Implement the genotype estimation by using
  the likelihood approach as it is defined in
  Heng Li paper: "A statistical framework for
  SNP calling, mutation discovery, association
  mapping and population genetical parameter
  estimation from sequencing data - 2.2 (eq2)".

* Add reference and alternate depth counts to
  VCF's info.

* Evidence for reads covering the reference
  allele is calculated now by the overlapping
  between read range and insertion point +/-
  read half decil. It is necessary in order to
  avoid overestimation of reads covering the
  reference allele due to mapping errors.

* Fix when two overlapped clusters share the
  same parental gene - maybe the edge points are
  reachable, not core points (DBSCAN) - it would
  be annotated as overlapped parentals. Now it
  will be annotated as PASS and the parental gene
  name wont be duplicated (e.g. PTEN/PTEN).

* Add Docker image for this project.

* Add documentation using Sphinx. So far, it is
  ready the topics about installation, usage,
  molecular gene biology method and simulation
  results.

Version 0.13.0

* Improve usage help by splitting all options
  according to its category.

* Log all user-given options and arguments.

* Add processing text interface to read 'FASTA'
  file format.

* Add VCF's writer interface. It was necessary
  to create new INFO and ALT tags for handle
  informations related to the retrocopy as:
  'PG' for parental gene, 'PGTYPE' for
  parental gene type, EXONIC/INTRONIC/NEAR
  for retrocopy genomic position - all those
  tags for INFO - and <INS:ME:RTC> for ALT.

* Add subcommand 'make-vcf' in order to
  manage VCF's file generation.

Version 0.12.0

* Add genotype and haplotype analysis. If
  the BAM index is found, perform a fast
  search for each retrocopy region, else
  index all regions inside an intervalar
  tree and make a slow linear search all
  over the file.

* To perform genotype and haplotype analysis,
  is required to go back to the BAM files -
  whose path is saved inside 'source' table.
  So, it is necessary that the user maintain
  the files where the program expects to find
  them.

* Remove bwa subproject, because it won't be
  used any longer.

Version 0.11.0

* Add retrocopy annotation: Insertion window
  and point, orientation rho and p-value,
  level PASS, OVERLAPPED, NEAR, HOTSPOT,
  AMBIGUOUS.

* Add deduplication capability. Remove
  duplicated reads but one, which is called
  the primary read. Other tools, specilized
  in remove duplications, use some metric to
  choose the primary reads. For us, it is just
  interesting the coordinates, so the primary
  reads are choosen randomly - mostly the
  first one to appear.

* There is the possibility that different BAM
  files share reads with the same query name.
  In order to avoid a mess to find the right
  mate, use the source_id along with qname when
  required to match reads from the same fragment.

Version 0.10.0

* Add reclustering step in order to filtering
  low number of reads comming from a given
  source (BAM). When those reads are removed,
  may occur that the cluster become rarefied,
  and therefore, invalid according to DBSCAN
  constraints.

* Add processing text interface to read 'BED'
  file format.

* Add blacklist interface and tables blacklist,
  overlapping_blacklist. The interface was
  inspired in 'exon.c' way. Also add more
  command-line options for indexing blacklisted
  regions from GTF/GFF3/BED files.

* Add gff filtering capabilities. The user must
  initiate a type GffFilter with the feature_type
  to be filtered (e.g. gene, transcript, exon)
  and may add attributes aswell (e.g.
  gene_type=protein_coding). The attributes are
  hard and soft - which mean, hard attributes
  must all match the pattern (AND); soft
  attributes, at least, must match one
  pattern (OR).

* Cluster filter: NONE, CHR, DIST, REGION, SUPPORT.
  The philosophy now is to keep all clusters and
  subclusters and add a new column to handle
  the filtering steps.

Version 0.9.0

* Add options '--blacklist-chr' and
  '--parental-distance' to 'merge-call' command.
  This way, it's possible to filter reads from
  chromosomes (as chrM), and avoid clustering
  inside, or very near, its own parental gene.

* 'DBSCAN' is not entirely deterministic: border
  points that are reachable from more than one
  cluster can be part of either cluster,
  depending on the order the data are processed
  (Wikipedia).
  Fix the 'REACHABLE' points which are shared
  among multiple clusters.

* Add statistics 'pearson' and 'spearman'
  correlation brought from 'GSL'. They will be
  necessary for calculating the retrocopy
  orientation.

Version 0.8.0

* Add new extensible hash algorithm. Move from
  chaining hash type to extensible hash. Now it
  has no need to declare the hash size in 'new'
  function, because it is dynamically allocated.

* Add a 'set' interface based on 'hash', instead
  of 'list'. It was necessary for speed up the
  'DBSCAN' when making the union operation.

* In 'process-sample' command, add phred quality
  score filter option in order to avoid low mapped
  quality reads.

* Add SQL INDEX for alignment(qname) table for
  speed up query.

* Update clustering query statement in order to
  avoid the abnormal reads that fall into their
  own parental gene.

* Fix bug in 'exon.c'. The alignment_id was
  included inside ExonTree object, which in turn
  was shared among all threads. With no mutex,
  may occur shocking among all alignment_id values.
  In order to fix it, a new private struct keeps
  'ExonTree' and 'alignment_id' separately.

* Move from 'Autotools' to 'Meson build system'.

Version 0.7.0

* Change abnormal interface to handle sorted
  and unsorted SAM/BAM files.

* processing-sample subcommand automatically
  detects if the alignment file is queryname
  sorted or not and then choose the right
  abnormal interface.

* Add a new table 'schema' with a single column
  called 'version'. Its function is to keep
  track of the database schema state in a
  versioned way.

Version 0.6.0

* Add DBSCAN algorithm - Density-Based Spatial
  Clustering of Applications with Noise. Its
  purpose is making an one-dimensional
  groupping of all reads per chromosome.

* Add CLI - Command Line Interface - based
  in subcommands: For now, there are two
  subcommands: processing-sample (alias
  ps) and merge-call (alias mc).

* Add low-level wrappers on SQLite3 functions
  in order to avoid so many testing against
  each statement.

* Improve testing coverage with gcov, lcov
  and COVERALLS platform.

Version 0.5.0

* Add a 'str' interface to handle string
  memory manipulation more efficiently.

* 'ibitree' lookup mechanism reshaped: Besides
  the range to search for, it acceps now node,
  interval overlapped fractions and the bitwise
  boolean (AND or OR) for testing if both, node
  and interval must overlap each other at the
  same percentage - or if just one being true
  is enough. Also, 'ibitree' keeps track of the
  position and length of overlapping regions.

* Add 'thpool' interface for thread pool.

* Add 'SQLite3' library to manage intermediate
  and possibly final results.

* 'abnormal' filtering is working. It selects
  the so called abnormal alignments: Thoses
  alignments, whose each sequenced read of the
  fragment falls into different chromosomes, or
  is far way from its mate. Also it catches
  supplementary alignments. All the results
  are recorded into the 'SQLite3' database.

* Add 'exon' interface. Its job is seach for
  abnormal alignments that overlap some exon.

Version 0.4.0

* Add an 'align' interface to save 'bwa mem'
  output in 'bwa' format.

* Borrow the 'samtools sort' algorithm and
  implement it at sam_sort interface.

* Add a 'binary tree' data structure and, on top
  of it, implement an 'AVL interval tree' to handle
  searching genomic annotation positons more
  efficiently.

* Add processing text interface to read 'gff/gtf'
  file format.

* Initiate 'abnormal' filter algorithm.

Version 0.3.0

* Use 'git submodule' to handle local building of
  'htslib' and 'bwa'. Those codes are statically
  linked against our software.

* Add 'bwa mem' and 'sam' wrappers.

Version 0.2.0

* Add array, list and hash data structures.

* Add logging interface to manage messages level.

* Wrapping of standard c functions for allocating
  memory and opening file.

Version 0.1.0

* Initial version. Project was set to work with
  'autotools' and testing with the framework 'check'.