-
Notifications
You must be signed in to change notification settings - Fork 2
/
RevisionHistory.txt
186 lines (145 loc) · 9.11 KB
/
RevisionHistory.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
Validate Fasta File Change Log
Version 2.3.7994; November 20, 2021
- Fix logic bug for rules that have false for MatchIndicatesProblem
- Warn if the protein sequence is likely DNA bases
- Report an error if a line starts with a non-breaking space or tab
- Add ability to read .fasta.gz files
- Look for the Unicode replacement character after protein names
Version 2.2.7887; August 5, 2021
- When /O is used to specify an output directory, the fixed FASTA file will now be created in the output directory instead of the input directory
Version 2.2.7794; May 4, 2021
- Convert to C#
- Add .faa as a supported file extension
- Report a warning if there is a non-breaking space after the protein name
- Add test file with UTF-8 encoding
Version 2.2.7410; April 15, 2020
- Check for long lines (with millions of residues or invalid characters)
- Update to .NET 4.7.2 and update PRISM.dll from NuGet
Version 2.2.7006; March 8, 2019
- Use ComputeStringHashSha1 in PRISM.dll
Version 2.2.6622; February 17, 2018
- Remove interfaces and use FileProcessor classes in PRISM.dll
Version 2.2.6471; September 19, 2017
- Update to .NET 4.6.2
- Use methods in PRISM.dll
- Convert from Hashtable to Dictionary
- Convert from ArrayList to generic lists
Version 2.1.6060; August 4, 2016
- Increase default maximum protein name length from 34 to 60 characters
Version 2.1.5926; March 23, 2016
- Fix bug determining the spanner key length for protein names
- Add warnings for protein residues B, J, O, U, X, or Z (previously treated J as an error and X as a warning)
Version 2.1.5885; February 11, 2016
- Updated behavior of /HashFile to store protein names in memory; not sequence hash values
- Results in decreased memory usage
- Updated clsNestedStringDictionary and clsNestedStringIntList to use StringComparer.Ordinal
- Now removing leading and trailing whitespace from residues
- Now auto-removing any asterisks at the end of protein sequences
Version 2.1.5884; February 10, 2016
- Added switch /HashFile for loading a file previously created using /B
- When /HashFile is used, the hash values are stored in a list rather than in a dictionary, reducing memory usage
Version 2.1.5877; February 3, 2016
- Replaced dictionaries with clsNestedStringDictionary to allow for decreased memory usage
- Replaced struct with a class for better memory allocation
- Now reporting memory usage at regular intervals
- Added clsStackTraceFormatter
Version 2.1.5773; October 22, 2015
- Added option AllowAllSymbolsInProteinNames
Version 2.1.5611; May 13, 2015
- Now allowing pound signs (#) in protein names
Version 2.1.5605; May 7, 2015
- Implemented switches /AllowDash and /AllowAsterisk (previously recognized, but had no effect)
- Added switches /SkipDupeSeqCheck/ and SkipDupeNameCheck
- Fixed bug in modMain that would set the rules using SetOptionSwitch but would not re-define the rules by calling SetDefaultRules
Version 2.1.5416; October 30, 2014
- Now adding a summary line to the stats listing number of proteins, number of residues, and file size in KB
Version 2.1.5371; September 15, 2014
- Now allowing for files to be opened even if another application has them open with a read/write lock
Version 2.1.5350; August 25, 2014
- Now limiting the protein description to 7995 characters in length when consolidating duplicate proteins
- Made ComputeProteinHash public
Version 2.1.5053; November 1, 2013
- Added clsProcessFilesOrFoldersBase
Version 2.1.4808; March 27, 2013
- Switched to AnyCPU
- Removed debug statement
Version 2.1.4808; March 1, 2013
- Updated ValidateFastaFile.dll to .NET 4
- New version of clsParseCommandLine.vb and clsProcessFilesBaseClass.vb
Version 2.1.4646; September 20, 2012
- Added option /KeepSameName
- Added option /V to indicate that invalid residues (non-letters) should be removed
- Updated to .NET 4
- Replaced several hashtable objects with generic lists
Version 2.0.4486; April 13, 2012
- Now showing error messages at the console instead of as popup message boxes
Version 2.0.4472; March 30, 2012
- Allow equals and plus signs in protein names
Version 2.0.4415; February 2, 2012
- Changed DEFAULT_MAXIMUM_PROTEIN_NAME_LENGTH to a public constant
Version 2.0.4276; September 16, 2011
- Updated to Visual Studio 2010
- Replaced .Now with .UtcNow
Version 2.0.3863; July 30, 2010
- Updated to Visual Studio 2008 (.NET 2.0)
- Now auto-enabling mCheckForDuplicateProteinNames if generating a fixed fasta and RenameProteinsWithDuplicateNames = True
Version 2.0.3637; December 16, 2009
- Fixed bug that failed to output the column number to the FastaFileStats text file
Version 2.0.3597; November 6, 2009
- Added option AllowDashInResidues
Version 2.0.3131; July 28, 2008
- Added option SaveBasicProteinHashInfoFile
- Useful for parsing huge .Fasta files to look for duplicate protein sequences without storing the protein names and sequences in memory
Version 2.0.3044; May 2, 2008
- Added a new section to the XML parameter file: ValidateFastaFixedFASTAFileOptions
- Moved parameters GenerateFixedFASTAFile and SplitOutMultipleRefsinProteinName into this new section
- Renamed parameters FixedFastaRenameDuplicateNameProteins and FixedFastaConsolidateDuplicateProteinSeqs to RenameDuplicateNameProteins and ConsolidateDuplicateProteinSeqs
- Added additional Fixed Fasta options
- Ability to control whether or not long protein names are truncated (setting TruncateLongProteinNames)
- Added option WrapLongResidueLines
- When true, then wraps residue lines to length MaximumResiduesPerLine (default is 120 characters)
- Added option RemoveInvalidResidues
- When true, then characters that are not A-Z will be removed from the residues
- When false, no residue characters are removed
- You can now define additional special character lists in the XML parameter file:
- LongProteinNameSplitChars
- ProteinNameInvalidCharsToRemove
- ProteinNameFirstRefSepChars
- ProteinNameSubsequentRefSepChars
- Added new option for splitting out multiple references from protein names, whereby this process is only performed if the name matches a known pattern
- Known patterns are IPI, gi, and jgi
- Examples:
- IPI:IPI00048500.11|ref|23848934 is split to IPI:IPI00048500.11 ref|23848934
- gi|169602219|ref|XP_001794531.1| is split to gi|169602219 ref|XP_001794531.1|
- jgi|Batde5|90624|GP3.061830 is split to jgi|Batde590624 GP3.061830
Version 2.0.2900; December 10, 2007
- When generating a fixed fasta file, the program can now ignore I and L residue differences when consolidating proteins that have duplicate sequences
- Use /L in the command line
Version 2.0.2896; December 6, 2007
- When generating a fixed fasta file, the program can now optionally consolidate proteins that have duplicate sequences
- Use /D in the command line
Version 2.0.2771; August 3, 2007
- When generating a fixed fasta file, the program will also now create a text file that lists the hash value for each unique sequence found in the input file
- Also included is the first protein name with for each sequence, and details on the other proteins that have the same exact sequence
- Additionally, if duplicate proteins are found, then a mapping file will be created that lists the first protein name, the sequence length, and each duplicate protein name
- Removed the dependence on PRISM.Dll and SharedVBNetRoutines.dll by adding files clsParseCommandLine.vb and clsXmlSettingsFileAccessor.vb to the project
Version 2.0.2769; August 1, 2007
- Added option to rename proteins with duplicate names rather than skipping them when generating a fixed fasta file
- When renaming, first tries appending "-b" to the original protein name, then checks if this new name matches any other protein names
- If the new name still results in a duplicate protein, then "-c", "-d", "-e", etc. are tried
- Fixed bug that incorrectly allowed residue lines to be written to a fixed fasta file when an invalid protein was encountered (having a duplicate name or a space after the > symbol)
Version 2.0.2684; May 8, 2007
- Added new warning that checks for duplicate protein sequences
- Now clearing the protein name and description each time a line that starts with > is encountered
Version 2.0.2347; June 5, 2006
- Added new warning that checks for protein descriptions over 900 characters long
Version 2.0.2203; January 12, 2006
- No longer trimming spaces from the end of a line since we need to be able to check for residue lines ending in spaces
- Updated the logging (file or console) to include the value and context associated with a given message, warning, or error
Version 2.0.2103; October 4, 2005
- Updated the SaveSettingsToParameterFile function to re-open the .Xml file and replace instances of ">" with ">" to improve readability
Version 2.0.2056; August 18, 2005
- Added new validation rule to look for escape code characters in the protein description
- Added option to fix the line terminator to guarantee it ends in CRLF
Version 2.0.2028; July 22, 2005
- Stable release