This section shows how to use the wrapper function in MarkovWeightedEFMs.jl to enumerate and assign AEFMs weights in a simple, multispecies reaction network.
using MarkovWeightedEFMs
S = [#
1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # Glc
0 -1 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 # ATP
@@ -61,36 +61,158 @@
"C(C1C(C(C(O1)(COP(=O)(O)O)O)O)O)OP(=O)(O)O",
"C([C@H](C=O)O)OP(=O)(O)O",
"C(C(=O)COP(=O)(O)O)O"
-]
We can check that the flux vector satisfies the steady state requirements.
all(S * v .== 0) # should evaluate as true
The following functions check for issues with the inputs. The first function find_atomic_chmc_input_errors
identifies possible problems with the stoichiometry matrix and flux vector. These problems, except for the steady state flux requirement, can be addressed via correct_atomic_chmc_input_errors
. Finally, the last function correct_atomic_chmc_input_smiles
checks and fixes problems relating to the SMILES strings.
# Confirm there are no issues with stoichiometry matrix
-errors = find_atomic_chmc_input_errors(S, v)
-print(errors) # summary of errors associated with S/v
-
-# S and v have no errors so the inputs are returned
-correct_atomic_chmc_input_errors(errors, S, mets, rxns)
-# S, mets, rxns = correct_atomic_chmc_input_errors(errors, S, mets, rxns) # otherwise
-
-# Correct issues associated with RXNMapper character limit and pseudometabolites
-S, v, mets, rxns, smiles, logs = correct_atomic_chmc_input_smiles(S, v, mets, rxns, smiles)
At this point, the SMILES strings (matching the updated mets
if there were errors in the initial inputs) should be canonicalized. S
is also converted to a Matrix{Int16}
which is a requirement for subsequent functions.
smiles = canonicalize_smiles(smiles)
The reaction SMILES strings are next constructed from the metabolite SMILES and the atom mapping is performed via RXNMapper. In this tutorial, we will be constructing an atomic CHMC rooted on a particular source metabolite carbon. We precompute an atom tracing dictionary mapping the (carbon) atom in the stoichiometric copy of a substrate to its product across each reaction.
# Construct atom traced SMILES strings
-rs, ms = map_reaction_strings(S, smiles, rxns, false)
-
-# Precompute atom tracing dictionary
-atom = :C # carbon
-atom_max = get_max_atoms(smiles, atom)
-D_C = precompute_atom_tracing_dictionary(S, ms, atom_max, atom) # S must be Matrix{Int16}
+]
-# Identify source metabolites
-src_mets = get_source_metabolites(S)
-max_src_met_carbons = atom_max[src_mets]
-nothing # hide
The following atomic CHMC is rooted on the first carbon atom of the first source metabolite in the stoichiometry matrix 6-phospho-D-gluconate.
I = (src_mets[1], 1, atom) # initial state is 1st carbon of canonicalized glucose
-res = steady_state_efm_distribution(S, v, ms, I, D_C; verbose = false) # S must be Matrix{Int16}
If we only wanted to enumerate the AEFMs, we would run:
enumerate_atomic_efms(S, ms, I, D_C, verbose = false)
Both functions produce the same output structure res
, except that the AEFM flux decomposition field will be a vector of zeros.
The output res
is an immutable struct with 8 fields:
res.i
is a tuple storing (i) the source metabolite index, (ii) source metabolite atom index (based on canonicalized SMILES string), and (iii) the atom type.
res.e
is an array of AEFMs with all corresponding simple cycle closures.
res.p
is an array of AEFM probabilities normalized to one.
res.w
is an array of AEFM weights normalized by the (unimolecular) reaction flux of the source metabolite.
res.dchmc
is a dictionary storing the ACHMC. The keys are the ACHMC states (composed of Markov chain states in res.dmc
). The values are the ACHMC state and the Markov chain state children.
res.dmc
is a dictionary converting Markov chain states to metabolite-atom positions. The value (0, 0)
always corresponds to the external environment sink node (which connects back to the source metabolite-atom state).
res.T
is a sparse array storing the ACHMC transition probability matrix.
res.R
is an array of tuples storing the reaction index/indices mapped to each ACHMC transition matrix element.
The corresponding AEFMs correspond to the movement of metabolite/atom states through the reaction network. We can convert these states into metabolites using get_efm_metabolite_atom_indices
. Note that there is one fewer metabolite name than AEFM metabolite indices because the pseudometabolite (0, 0)
linking sink and source reactions is omitted.
# First AEFM
-mets[first.(get_efm_metabolite_atom_indices(res, 1))]
# Second AEFM
-mets[first.(get_efm_metabolite_atom_indices(res, 2))]
The following plotting function visualizes the ACHMC rooted on state I
. This is only recommended for exploring ACHMCs of small networks.
using GLMakie # Makie backend
+atom = :C # carbon atom type for AEFMs
We can check that the flux vector satisfies the steady state requirements.
all(S * v .== 0) # should evaluate as true
true
The following function pre-processes the input metabolic network for computing the AEFM weights for the specified atom type.
mdl, atom_info, logs = preprocess_all_for_atomic_chmc(S, v, mets, rxns, smiles, atom)
The variable mdl
is a NamedTuple containing the updated stoichiometry matrix, flux vector, metabolite/reaction names, metabolite SMILES strings, reaction SMILES strings, and mapped reaction SMILES strings.
keys(mdl)
(:S, :v, :mets, :rxns, :smiles, :rs, :ms)
The variable logs
contains details about input metabolic network in addition to listing the pseudometabolites and pseudoreactions dropped from the network based on the input SMILES strings.
keys(logs)
(:model_errors, :smiles_warnings)
print(logs.model_errors)
############################################################
+## ERROR CHECKING STOICHIOMETRY MATRIX AND FLUX VECTOR #####
+# (1) SUM OF ABSOLUTE FLUX RECONSTRUCTION ERROR:
+# 0.0
+# PASSED.
+# (2) REACTIONS THAT ARE DUPLICATES:
+# NONE.
+# PASSED.
+# (3) REACTIONS WTIH ZERO FLUX:
+# NONE.
+# PASSED.
+# (4) REACTIONS WTIH NEGATIVE FLUX:
+# NONE.
+# PASSED.
+# (5) INTERNAL REACTIONS W/ NON-INTEGER STOICHIOMETRIES:
+# NONE.
+# PASSED.
+# (6) UNIMOLECULAR SOURCE REACTIONS W/ STOICH == 1:
+# 1, 13, 16
+# PASSED.
+# (7) UNIMOLECULAR SOURCE REACTIONS W/ STOICH != 1:
+# NONE.
+# PASSED.
+# (8) MULTIMOLECULAR SOURCE REACTIONS W/ STOICH == 1:
+# NONE.
+# PASSED.
+# (9) MULTIMOLECULAR SOURCE REACTIONS W/ STOICH != 1:
+# NONE.
+# PASSED.
+# (10) UNIMOLECULAR SINK REACTIONS W/ STOICH == 1:
+# 4, 11, 12, 14, 15
+# PASSED.
+# (11) UNIMOLECULAR SINK REACTIONS W/ STOICH != 1:
+# NONE.
+# PASSED.
+# (12) MULTIMOLECULAR SINK REACTIONS W/ STOICH == 1:
+# NONE.
+# PASSED.
+# (13) MULTIMOLECULAR SINK REACTIONS W/ STOICH != 1:
+# NONE.
+# PASSED.
+# (14) REACTIONS W/ NO SUBSTRATES OR PRODUCTS:
+# NONE.
+# PASSED.
+# (15) # METABOLITES PARTICIPATING IN NO REACTIONS:
+# NONE.
+# PASSED.
+# STATUS:
+# PASSED. THESE INPUTS SATISFY ATOMIC CHMC REQUIREMENTS.
+############################################################
logs.smiles_warnings
(dropped_rows_pseudometabolites = Int64[], dropped_cols_pseudometabolites = Int64[], dropped_cols_rxnmapper_limit = Int64[])
The variable atom_info
contains the indices of all source metabolites in the updated network and the number of occurrences for the input atom of interest. It also contains an atom-mapping dictionary relating substrate-atom positions to product-atom positions in each reaction. These are useful for programmatically computing AEFMs across all source metabolite-atom combinations of interest.
keys(atom_info)
(:atom, :src_mets, :max_src_met_atoms, :D)
atom_info.src_mets # source metabolite indices
3-element Vector{Int64}:
+ 1
+ 2
+ 8
atom_info.max_src_met_atoms # counts of specified atom in each source metabolite
3-element Vector{Int64}:
+ 6
+ 10
+ 0
atom_info.D # atom-mapping dictionary
Dict{NTuple{4, Int64}, Tuple{Int64, Int64}} with 62 entries:
+ (3, 2, 1, 3) => (5, 2)
+ (2, 2, 1, 6) => (4, 2)
+ (1, 2, 1, 2) => (3, 2)
+ (2, 5, 1, 2) => (4, 5)
+ (3, 3, 1, 5) => (6, 2)
+ (3, 4, 1, 5) => (6, 1)
+ (3, 6, 1, 3) => (5, 4)
+ (2, 6, 1, 6) => (4, 6)
+ (1, 6, 1, 2) => (3, 6)
+ (2, 7, 1, 6) => (4, 7)
+ (11, 3, 1, 10) => (10, 3)
+ (2, 9, 1, 2) => (4, 9)
+ (3, 1, 1, 5) => (6, 6)
+ (3, 2, 1, 5) => (6, 5)
+ (3, 5, 1, 3) => (5, 5)
+ (2, 5, 1, 6) => (4, 5)
+ (9, 3, 1, 7) => (6, 5)
+ (1, 5, 1, 2) => (3, 5)
+ (9, 4, 1, 7) => (6, 6)
+ ⋮ => ⋮
The following atomic CHMC is rooted on the first carbon atom of the first source metabolite in the stoichiometry.
I = (atom_info.src_mets[1], 1, atom) # initial state is 1st carbon of canonicalized glucose
+res = steady_state_efm_distribution(mdl.S, mdl.v, mdl.ms, I, atom_info.D; verbose = false) # S must be Matrix{Int16}
res
CHMCAtomicSummary((1, 1, :C), @NamedTuple{EFM::Vector{Int64}, Closures::Vector{Tuple{Int64, Int64}}}[(EFM = [6, 5, 6], Closures = [(6, 5)]), (EFM = [4, 1, 2, 5, 6, 7, 4], Closures = [(8, 1)]), (EFM = [4, 1, 2, 3, 4], Closures = [(4, 1)]), (EFM = [8, 7, 8], Closures = [(9, 7)]), (EFM = [4, 1, 2, 5, 6, 7, 8, 4], Closures = [(10, 1)])], [0.09000000000000001, 0.56, 0.27, 0.01, 0.07], [1.0, 6.222222222222222, 3.0, 0.11111111111111112, 0.7777777777777778], Dict{Int64, Tuple{Int16, Int16}}(5 => (6, 6), 4 => (0, 0), 6 => (9, 4), 7 => (11, 2), 2 => (3, 1), 8 => (10, 1), 3 => (5, 3), 1 => (1, 1)), Dict{Vector{Int16}, @NamedTuple{id::Int64, children::Vector{Int16}}}([1, 2, 5, 6, 7] => (id = 7, children = [8]), [1, 2, 5, 6] => (id = 6, children = [7]), [1, 2, 5, 6, 7, 4] => (id = 8, children = []), [1] => (id = 1, children = [2]), [1, 2, 3] => (id = 3, children = []), [1, 2, 3, 4] => (id = 4, children = []), [1, 2] => (id = 2, children = [3, 5]), [1, 2, 5] => (id = 5, children = [6]), [1, 2, 5, 6, 7, 8] => (id = 9, children = []), [1, 2, 5, 6, 7, 8, 4] => (id = 10, children = [])…), sparse([4, 8, 10, 1, 2, 3, 2, 6, 5, 6, 9, 7, 7, 9], [1, 1, 1, 2, 3, 4, 5, 5, 6, 7, 7, 8, 9, 10], [1.0, 1.0, 1.0, 1.0, 0.3, 1.0, 0.7, 0.125, 1.0, 0.875, 0.125, 0.875, 0.125, 0.875], 10, 10), @NamedTuple{i::Int64, j::Int64, k::Int16}[(i = 1, j = 2, k = 2), (i = 2, j = 3, k = 3), (i = 2, j = 5, k = 5), (i = 3, j = 4, k = 4), (i = 4, j = 1, k = 4), (i = 5, j = 6, k = 6), (i = 6, j = 5, k = 7), (i = 6, j = 7, k = 8), (i = 7, j = 8, k = 12), (i = 8, j = 1, k = 12), (i = 7, j = 9, k = 10), (i = 9, j = 10, k = 11), (i = 10, j = 1, k = 11), (i = 9, j = 7, k = 9)])
If we only wanted to enumerate the AEFMs, we would run:
res_enum = enumerate_atomic_efms(mdl.S, mdl.ms, I, atom_info.D, verbose = false)
Both functions produce the same output structure res
, except that the AEFM flux decomposition fields will be empty. The transition matrix will also default to uniformly distributed probabilities along each row.
The output res
is an immutable struct with 8 fields:
res.i
is a tuple storing (i) the source metabolite index, (ii) source metabolite atom index (based on canonicalized SMILES string), and (iii) the atom type. This is a copy of the variable I
.
res.i
(1, 1, :C)
res.e
is an array of AEFMs with all corresponding simple cycle closures.
res.e
5-element Vector{@NamedTuple{EFM::Vector{Int64}, Closures::Vector{Tuple{Int64, Int64}}}}:
+ (EFM = [6, 5, 6], Closures = [(6, 5)])
+ (EFM = [4, 1, 2, 5, 6, 7, 4], Closures = [(8, 1)])
+ (EFM = [4, 1, 2, 3, 4], Closures = [(4, 1)])
+ (EFM = [8, 7, 8], Closures = [(9, 7)])
+ (EFM = [4, 1, 2, 5, 6, 7, 8, 4], Closures = [(10, 1)])
res.p
is an array of AEFM probabilities normalized to one.
res.p
5-element Vector{Float64}:
+ 0.09000000000000001
+ 0.56
+ 0.27
+ 0.01
+ 0.07
res.w
is an array of AEFM weights normalized by the (unimolecular) reaction flux of the source metabolite.
res.w
5-element Vector{Float64}:
+ 1.0
+ 6.222222222222222
+ 3.0
+ 0.11111111111111112
+ 0.7777777777777778
res.dchmc
is a dictionary storing the ACHMC. The keys are the ACHMC states (composed of Markov chain states in res.dmc
). The values are the ACHMC state and the Markov chain state children.
res.dchmc
Dict{Vector{Int16}, @NamedTuple{id::Int64, children::Vector{Int16}}} with 10 entries:
+ [1, 2, 5, 6, 7] => (id = 7, children = [8])
+ [1, 2, 5, 6] => (id = 6, children = [7])
+ [1, 2, 5, 6, 7, 4] => (id = 8, children = [])
+ [1] => (id = 1, children = [2])
+ [1, 2, 3] => (id = 3, children = [])
+ [1, 2, 3, 4] => (id = 4, children = [])
+ [1, 2] => (id = 2, children = [3, 5])
+ [1, 2, 5] => (id = 5, children = [6])
+ [1, 2, 5, 6, 7, 8] => (id = 9, children = [])
+ [1, 2, 5, 6, 7, 8, 4] => (id = 10, children = [])
res.dmc
is a dictionary converting Markov chain states to metabolite-atom positions. The value (0, 0)
always corresponds to the external environment sink node (which connects back to the source metabolite-atom state).
res.dmc
Dict{Int64, Tuple{Int16, Int16}} with 8 entries:
+ 5 => (6, 6)
+ 4 => (0, 0)
+ 6 => (9, 4)
+ 7 => (11, 2)
+ 2 => (3, 1)
+ 8 => (10, 1)
+ 3 => (5, 3)
+ 1 => (1, 1)
res.T
is a sparse array storing the ACHMC transition probability matrix.
res.T
10×10 SparseArrays.SparseMatrixCSC{Float64, Int64} with 14 stored entries:
+ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
+ ⋅ ⋅ 0.3 ⋅ 0.7 ⋅ ⋅ ⋅ ⋅ ⋅
+ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
+ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
+ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅
+ ⋅ ⋅ ⋅ ⋅ 0.125 ⋅ 0.875 ⋅ ⋅ ⋅
+ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.875 0.125 ⋅
+ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
+ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.125 ⋅ ⋅ 0.875
+ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
res.R
is an array of tuples storing the reaction index/indices mapped to each ACHMC transition matrix element.
res.R
14-element Vector{@NamedTuple{i::Int64, j::Int64, k::Int16}}:
+ (i = 1, j = 2, k = 2)
+ (i = 2, j = 3, k = 3)
+ (i = 2, j = 5, k = 5)
+ (i = 3, j = 4, k = 4)
+ (i = 4, j = 1, k = 4)
+ (i = 5, j = 6, k = 6)
+ (i = 6, j = 5, k = 7)
+ (i = 6, j = 7, k = 8)
+ (i = 7, j = 8, k = 12)
+ (i = 8, j = 1, k = 12)
+ (i = 7, j = 9, k = 10)
+ (i = 9, j = 10, k = 11)
+ (i = 10, j = 1, k = 11)
+ (i = 9, j = 7, k = 9)
The corresponding AEFMs correspond to the movement of metabolite/atom states through the reaction network. We can convert these states into metabolites using get_efm_metabolite_atom_indices
. Note that there is one fewer metabolite name than AEFM metabolite indices because the pseudometabolite (0, 0)
linking sink and source reactions is omitted.
# First AEFM
+efm_seq_1 = mets[first.(get_efm_metabolite_atom_indices(res, 1))]
efm_seq_1
3-element Vector{String}:
+ "FDP"
+ "F6P"
+ "FDP"
# Second AEFM
+efm_seq_2 = mets[first.(get_efm_metabolite_atom_indices(res, 2))]
efm_seq_2
5-element Vector{String}:
+ "Glc"
+ "G6P"
+ "F6P"
+ "FDP"
+ "DHAP"
The following plotting function visualizes the ACHMC rooted on state I
. This is only recommended for exploring ACHMCs of small networks.
using GLMakie # Makie backend
GLMakie.activate!()
plot_atomic_chmc(res, S, mets, rs)
Each node in the main panel corresponds to a CHMC state (metabolite and atomic index).
Clicking on a CHMC transition will highlight that transition and display the corresponding metabolic reaction on the upper panel. The pair of purple highlighted atoms correspond to the movement of the same atom from the LHS to RHS of the reaction.
Finally, the reaction and mapped reaction SMILES strings can also be plotted as an SVG and previewed using a package like ElectronDisplay. If fname != ""
, the SVG is also saved to file. By default, fname == ""
and the SVG is not saved. The default canvas width and height are 1420 by 580 (pixels) but these can be changed. If using ElectronDisplay and the image is cut off, try resizing the plotting window or reducing the canvas dimensions.
using ElectronDisplay
# Reaction string
-plot_mapped_reaction(rs[2], view=true)
+plot_mapped_reaction(rs[2], view=true, canvas_width = 1420, canvas_height = 580)
#plot_mapped_reaction(rs[2], "\path\to\save\name.svg", view=true)
# Mapped reaction string
plot_mapped_reaction(ms[2], view = true, canvas_width = 1420, canvas_height = 580)
-#plot_mapped_reaction(ms[2], "\path\to\save\name.svg", view = true,)