Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write abstract #6

Open
sjackman opened this issue Oct 18, 2016 · 8 comments
Open

Write abstract #6

sjackman opened this issue Oct 18, 2016 · 8 comments
Assignees

Comments

@sjackman
Copy link
Contributor

Hi, all. We have to write our scientific and lay abstract. See hackseq/October_2016#76

Please write a few sentences about your work here. I'll put them all together. Thanks!

@sjackman sjackman self-assigned this Oct 18, 2016
@GlastonburyC
Copy link

I tested grid-search based parameter optimization with the python tool, ParOpt. I coded a wrapper function in python that outputs a single value obtained from abyss that we were interested in optimizing, N50. First, I optimized k, k-mer length. I then extended the method to do simultaneous optimization of four parameters, k,s,n & l. Afterwards, to test the quality of the genome assembly, I generated reports for all optimization runs using QUAST, utilizing the reference genome to assess the completeness of each genome assembly.

@drveera
Copy link

drveera commented Oct 18, 2016

I wrote an R script to plot a heat map of the output of spearmint optimization.

@hyounesy
Copy link
Contributor

hyounesy commented Oct 19, 2016 via email

@lisabang
Copy link
Contributor

I used Jupyter notebook widgets with matplotlib to visualize how each iteration of optimization changes the parameters to maximize the parameter optimized. An example .ipynb file is available.

@jasgrewal
Copy link
Contributor

Genetic Algorithms are a set of adaptive algorithms that have been applied in several optimization problems for the identification of Pareto Frontiers (a multi-metric optimal solution space for an evaluated function). GAs are also used in optimization problems because of their tractability for global optimization of multiple parameters simultaneously, in the presence of several local optimas.
The Optimization algorithm chosen from among the set of genetic algorithms available for optimization in this project was Neighbourhood Sorting Genetic Algorithm II (NSGA II). This algorithm is able to optimize for multiple parameters, considering several output metrics for optimization simultaneously. Initially proposed in 2002 by K. Deb et al, this GA algorithm converges rapidly. I used it as part of the 'mco' package, an R package that uses genetic algorithms for multi-objective, multi-parameter optimzation https://cran.r-project.org/web/packages/mco/index.html. I searched for an optimal solution space in pre-determined ranges of the discrete valued ABYSS parameters k and s, with the output metrics N50 and L50. The NSGA II algorithm did indeed converge quite rapidly in both single- and multi- parameter (metric) optimizations, however, it was interesting to note that the best N50/L50 metric value pair of (23699, 4) was found when the optimization was performed on k and s simultaneously, and both N50 and L50 were being optimized for. The solution set in this case, defined as (k,s), was {(35, 650),(35, 690)}. It was interesting to note the multiple solutions proposed by the GA converging to the same output metric in this case (in all other scenarios for optimization of the parameters vs metrics, it converged to a single solution). Furthermore, even when the optimization criteria was searching for a solution space distributed in k and s, but only optimizing with respect to the N50 metric, the main response metric N50 was valued at 16461, 30% lower than the best. An additional outcome of this exercise was the demonstration of the utility of GA optimization for discrete valued parameters. A lot of optimization algorithms and tools that were evaluated in this project (including mco) rely on a continuous range, and we thus added a rounding step in the query building process for the ABYSS tool to mimic discretization of the solution space. It was interesting to note that the NSGA-II algorithm still converged to a fixed valued optimal solution space, showing potential for its application to discrete optimization problems.

@jasgrewal
Copy link
Contributor

Apologies for making it so long, clubbed together my introduction and some of the interesting observations too. Cheers guys, it was great working with all of you!

@lfunderburk
Copy link
Contributor

I tested a given data set (500k.fq ?) via Bayesian optimization using Spearmint and MongoDB. I collaborated with Shaun Jackman on this subpart and extended abyss.py to optimize for two parameters k and s. After 20 iterations a maximum for N50 of 19356 was found at k=28, s=200.

@yxblee
Copy link
Contributor

yxblee commented Oct 19, 2016

I've implemented functions, in Python, to call upon ABySS and initiate assembly runs for the OPAL (OPtimization ALgorithm) program, a Python/Cython framework using a black-box program written in C/C++ and based on the NOMADS (Nonlinear Optimization Mesh-Adaptive Direct Search) algorithm. This allowed me to test for the optimization of the k value, given also its parameter, the variable N50 (maximum) to be measured against, and the dataset 200k.

While the NOMADS algorithm behind OPAL does allow for multiple continuous and discrete values, I have yet to try them. For reporting the results, I still need to find a way to record and display all its running inputs-versus-outputs, since the program currently only writes out the final optimized value for the given measure in a single file, which is overwritten after each run. I also need to implement some conditional statements so the program does not halt indefinitely from failed ABySS make due to invalid inputs, nor skip over files already created from previous runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants