-
Notifications
You must be signed in to change notification settings - Fork 5
/
NEWS
383 lines (281 loc) · 10.6 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
Version 1.1.6
* Use 'mandir' variable to handle man page
installation. This will be useful when
packaging for distributions - like debian.
* Fix an english typo in usage.rst.
Version 1.1.5
* Now, by default, compile and install a man
page alongside the sider executable when
sphinx-build is present.
* Add CodeQL to test vulnerabilities in code.
* Add a security policy.
Version 1.1.4
* Fix an english typo in cluster.c (cluster).
* Fix readthedocs build.
* Update ubuntu image version for docker and
github workflows. Use ubuntu 24.04.
Version 1.1.3
* Improve memory usage when reading unsorted,
non-collated, alignment files. To solve
running unsorted file, read it three time:
Firstly index all abnormal read querynames
into a hash table, then remove low-quality
fragments and finally dump all indexed
fragments to the database.
Version 1.1.2
* Fix a possible vulnerability in 'strncat'
and 'strcpy' inside a loop, replacing them
with 'memcpy'.
* Use multi-stage build to decrease the final
docker image size.
Version 1.1.1
* Fix testing. Point the right 'libcheck'
version (>= 0.15.0) in 'meson.build'.
Version 1.1.0
* Fix bug in 'merge-call' subcommand when
merging databases with the option 'in-place'
active.
* Fix testing to handle SIGABRT and SIGSEGV.
* Move from Travis-CI to Github Actions. Test
the code, build and deploy to Dockerhub.
* Update Dockerfile base image to ubuntu
version 20.04.
Version 1.0.0
* Now sideRETRO works with the CRAM file
format too.
* Fix bug when reading GTF, possibly GFF3
as well, and there is a space into the attr
value: gene_name "My gene name". Now, the
attr will be correctly splited in:
key=gene_name and value='My gene name'.
* Add citation option. Print it in BibTeX
format.
* Add 'gz' interface, in order to wrapper
'libz' library and remove duplicated
code.
* Add the copying notices header to each
source script.
Version 0.14.1
* Add 'git' to Dockerfile. It will avoid
errors when picking the right version.
* Fix the default value for 'phred-quality'.
* Fix building: 'config.h' is dynamically
generated from template 'config.h.in', so
it need to be made before compiling. In order
to assure the required order, declare
'config.h' as a dependency of library and
executable.
Version 0.14.0
* Implement the genotype estimation by using
the likelihood approach as it is defined in
Heng Li paper: "A statistical framework for
SNP calling, mutation discovery, association
mapping and population genetical parameter
estimation from sequencing data - 2.2 (eq2)".
* Add reference and alternate depth counts to
VCF's info.
* Evidence for reads covering the reference
allele is calculated now by the overlapping
between read range and insertion point +/-
read half decil. It is necessary in order to
avoid overestimation of reads covering the
reference allele due to mapping errors.
* Fix when two overlapped clusters share the
same parental gene - maybe the edge points are
reachable, not core points (DBSCAN) - it would
be annotated as overlapped parentals. Now it
will be annotated as PASS and the parental gene
name wont be duplicated (e.g. PTEN/PTEN).
* Add Docker image for this project.
* Add documentation using Sphinx. So far, it is
ready the topics about installation, usage,
molecular gene biology method and simulation
results.
Version 0.13.0
* Improve usage help by splitting all options
according to its category.
* Log all user-given options and arguments.
* Add processing text interface to read 'FASTA'
file format.
* Add VCF's writer interface. It was necessary
to create new INFO and ALT tags for handle
informations related to the retrocopy as:
'PG' for parental gene, 'PGTYPE' for
parental gene type, EXONIC/INTRONIC/NEAR
for retrocopy genomic position - all those
tags for INFO - and <INS:ME:RTC> for ALT.
* Add subcommand 'make-vcf' in order to
manage VCF's file generation.
Version 0.12.0
* Add genotype and haplotype analysis. If
the BAM index is found, perform a fast
search for each retrocopy region, else
index all regions inside an intervalar
tree and make a slow linear search all
over the file.
* To perform genotype and haplotype analysis,
is required to go back to the BAM files -
whose path is saved inside 'source' table.
So, it is necessary that the user maintain
the files where the program expects to find
them.
* Remove bwa subproject, because it won't be
used any longer.
Version 0.11.0
* Add retrocopy annotation: Insertion window
and point, orientation rho and p-value,
level PASS, OVERLAPPED, NEAR, HOTSPOT,
AMBIGUOUS.
* Add deduplication capability. Remove
duplicated reads but one, which is called
the primary read. Other tools, specilized
in remove duplications, use some metric to
choose the primary reads. For us, it is just
interesting the coordinates, so the primary
reads are choosen randomly - mostly the
first one to appear.
* There is the possibility that different BAM
files share reads with the same query name.
In order to avoid a mess to find the right
mate, use the source_id along with qname when
required to match reads from the same fragment.
Version 0.10.0
* Add reclustering step in order to filtering
low number of reads comming from a given
source (BAM). When those reads are removed,
may occur that the cluster become rarefied,
and therefore, invalid according to DBSCAN
constraints.
* Add processing text interface to read 'BED'
file format.
* Add blacklist interface and tables blacklist,
overlapping_blacklist. The interface was
inspired in 'exon.c' way. Also add more
command-line options for indexing blacklisted
regions from GTF/GFF3/BED files.
* Add gff filtering capabilities. The user must
initiate a type GffFilter with the feature_type
to be filtered (e.g. gene, transcript, exon)
and may add attributes aswell (e.g.
gene_type=protein_coding). The attributes are
hard and soft - which mean, hard attributes
must all match the pattern (AND); soft
attributes, at least, must match one
pattern (OR).
* Cluster filter: NONE, CHR, DIST, REGION, SUPPORT.
The philosophy now is to keep all clusters and
subclusters and add a new column to handle
the filtering steps.
Version 0.9.0
* Add options '--blacklist-chr' and
'--parental-distance' to 'merge-call' command.
This way, it's possible to filter reads from
chromosomes (as chrM), and avoid clustering
inside, or very near, its own parental gene.
* 'DBSCAN' is not entirely deterministic: border
points that are reachable from more than one
cluster can be part of either cluster,
depending on the order the data are processed
(Wikipedia).
Fix the 'REACHABLE' points which are shared
among multiple clusters.
* Add statistics 'pearson' and 'spearman'
correlation brought from 'GSL'. They will be
necessary for calculating the retrocopy
orientation.
Version 0.8.0
* Add new extensible hash algorithm. Move from
chaining hash type to extensible hash. Now it
has no need to declare the hash size in 'new'
function, because it is dynamically allocated.
* Add a 'set' interface based on 'hash', instead
of 'list'. It was necessary for speed up the
'DBSCAN' when making the union operation.
* In 'process-sample' command, add phred quality
score filter option in order to avoid low mapped
quality reads.
* Add SQL INDEX for alignment(qname) table for
speed up query.
* Update clustering query statement in order to
avoid the abnormal reads that fall into their
own parental gene.
* Fix bug in 'exon.c'. The alignment_id was
included inside ExonTree object, which in turn
was shared among all threads. With no mutex,
may occur shocking among all alignment_id values.
In order to fix it, a new private struct keeps
'ExonTree' and 'alignment_id' separately.
* Move from 'Autotools' to 'Meson build system'.
Version 0.7.0
* Change abnormal interface to handle sorted
and unsorted SAM/BAM files.
* processing-sample subcommand automatically
detects if the alignment file is queryname
sorted or not and then choose the right
abnormal interface.
* Add a new table 'schema' with a single column
called 'version'. Its function is to keep
track of the database schema state in a
versioned way.
Version 0.6.0
* Add DBSCAN algorithm - Density-Based Spatial
Clustering of Applications with Noise. Its
purpose is making an one-dimensional
groupping of all reads per chromosome.
* Add CLI - Command Line Interface - based
in subcommands: For now, there are two
subcommands: processing-sample (alias
ps) and merge-call (alias mc).
* Add low-level wrappers on SQLite3 functions
in order to avoid so many testing against
each statement.
* Improve testing coverage with gcov, lcov
and COVERALLS platform.
Version 0.5.0
* Add a 'str' interface to handle string
memory manipulation more efficiently.
* 'ibitree' lookup mechanism reshaped: Besides
the range to search for, it acceps now node,
interval overlapped fractions and the bitwise
boolean (AND or OR) for testing if both, node
and interval must overlap each other at the
same percentage - or if just one being true
is enough. Also, 'ibitree' keeps track of the
position and length of overlapping regions.
* Add 'thpool' interface for thread pool.
* Add 'SQLite3' library to manage intermediate
and possibly final results.
* 'abnormal' filtering is working. It selects
the so called abnormal alignments: Thoses
alignments, whose each sequenced read of the
fragment falls into different chromosomes, or
is far way from its mate. Also it catches
supplementary alignments. All the results
are recorded into the 'SQLite3' database.
* Add 'exon' interface. Its job is seach for
abnormal alignments that overlap some exon.
Version 0.4.0
* Add an 'align' interface to save 'bwa mem'
output in 'bwa' format.
* Borrow the 'samtools sort' algorithm and
implement it at sam_sort interface.
* Add a 'binary tree' data structure and, on top
of it, implement an 'AVL interval tree' to handle
searching genomic annotation positons more
efficiently.
* Add processing text interface to read 'gff/gtf'
file format.
* Initiate 'abnormal' filter algorithm.
Version 0.3.0
* Use 'git submodule' to handle local building of
'htslib' and 'bwa'. Those codes are statically
linked against our software.
* Add 'bwa mem' and 'sam' wrappers.
Version 0.2.0
* Add array, list and hash data structures.
* Add logging interface to manage messages level.
* Wrapping of standard c functions for allocating
memory and opening file.
Version 0.1.0
* Initial version. Project was set to work with
'autotools' and testing with the framework 'check'.