rheo_paper.tex

\documentclass{article}
\usepackage{amsmath,amssymb,amstext,mathtools,array,url,bm,graphicx,color,epsfig}
\usepackage{fullpage,setspace}
\usepackage{authblk}
\usepackage{filecontents}
\usepackage{natbib}
\usepackage{lineno}
%\usepackage[colorlinks]{hyperref}
%\usepackage{hyperref}
\usepackage{subcaption}
\usepackage{float}
\usepackage[flushleft]{threeparttable}

\def\dt{\partial t}
\def\dx{\partial x}
\def\dy{\partial y}

\def\dz{\partial z}
\def\BE{\begin{equation}}
\def\EE{\end{equation}}
\def\half{\frac{1}{2}}
\def\calT{\cal T}
\def\Deltax{\Delta x}
\def\Deltat{\Delta t}
\DeclareSymbolFont{largesymbolsA}{U}{txexa}{m}{n}
\DeclareMathSymbol{\varprod}{\mathop}{largesymbolsA}{16}


\newtheorem{theorem}{Theorem}
\newtheorem{acknowledgement}[theorem]{Acknowledgement}
\newtheorem{algorithm}[theorem]{Algorithm}
\newtheorem{axiom}[theorem]{Axiom}
\newtheorem{case}[theorem]{Case}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{conclusion}[theorem]{Conclusion}
\newtheorem{condition}[theorem]{Condition}
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{criterion}[theorem]{Criterion}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{exercise}[theorem]{Exercise}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{notation}[theorem]{Notation}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{solution}[theorem]{Solution}
\newtheorem{summary}[theorem]{Summary}
\newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}}

\begin{document}

\title{\bf Comparative analysis of the structures and outcomes of geophysical flow models and modeling assumptions using uncertainty quantification}
%\titlerunning{Comparative analysis of geophysical flow models using UQ}

\author[1,6]{\small Abani Patra}
\author[1,2,3]{Andrea Bevilacqua}
\author[4]{Ali Akhavan-Safaei}
\author[5]{\\E. Bruce Pitman}
\author[2]{Marcus Bursik}
\author[2,7]{David Hyman}

\affil[1]{\small\textit{Computational Data Science and Engineering, University at Buffalo, Buffalo, NY}}
\affil[2]{\textit{Department of Earth Sciences, University at Buffalo, Buffalo, NY}}
\affil[3]{\textit{Istituto Nazionale di Geofisica e Vulcanologia, Sezione di Pisa, Pisa, Italy}}
\affil[4]{\textit{Department of Mechanical and  Aerospace Engineering, University at Buffalo, Buffalo, NY}}
\affil[5]{\textit{Department of Materials Design and Innovation, University at Buffalo, NY}}
\affil[6]{\textit{Tufts University, Data Intensive Sciences Center, Medford, MA}}
\affil[7]{\textit{Now at Cooperative Institute for Meteorological Satellite Studies, UW, Madison, WI}}


\date{\texttt{abani@tufts.edu; andrea.bevilacqua@ingv.it; mib@buffalo.edu; pitman@buffalo.edu; dhyman2@wisc.edu}}


\maketitle

\abstract

%We present a new statistically driven method for analyzing the modeling of geophysical flows. Many models have been advocated by different modelers for such flows incorporating different modeling assumptions. Limited and sparse observational data on the modeled phenomena usually does not permit a clean discrimination among models for fitness of purpose, and, heuristic choices are usually made, especially for critical predictions of behavior that has not been experienced.
We advocate here a methodology for characterizing models of geophysical flows and the modeling assumptions they represent, using a statistical approach over the full range of applicability of the models. Such a characterization may then be used to decide the appropriateness of a model and modeling assumption for use. %, and, perhaps as needed  compositions of models for better predictive power.
We present our method by comparing three different models arising from different rheology assumptions, and the output data show unambiguously the performance of the models across a wide range of possible flow regimes. This comparison is facilitated by the recent development of the new release of our TITAN2D mass flow code that allows choice of multiple rheologies. The quantitative and probabilistic analysis of contributions from different modeling  assumptions in the models is particularly illustrative of the impact of the assumptions. Knowledge of which assumptions dominate, and, by how much, is illustrated in %two different case studies: a small scale inclined plane with a flat runway, and
the topography on the SW slope of Volc\'{a}n de Colima (MX). A simple model performance evaluation completes the presentation.

\section{Introduction}
%We present in this paper a simple data driven approach to discriminate among models and the modeling assumptions implicit in each model for complex models of geophysical mass flows. We illustrate the approach by work on geophysical mass flows.
This paper describes an uncertainty quantification driven approach to analyze total models and individual assumptions that are composed  into models of geophysical mass flows.
This is especially relevant when the observations/measurements are not adequate to characterize the behavior of the system which is modeled. Observational { data inadequacy} is rarely characterized even for verified and validated models.

Since models actualize a hypothesis, it follows  that a model articulates a belief about the data. Thus a model will always have some uncertainty in prediction, since the {subjectivity of the belief} can never be completely eliminated \citep{Kennedy2001, Higdon2004}, nor is  the data at hand  usually enough to characterize its behavior at the desired prediction. Principles like ``Occam's razor" and Bayesian statistics \citep{Farrell2015} provide some guidance, but {\it  robust and quantitative approaches that allow the  testing of model components for fitness }need to be developed. In related work, we have shown that the  application of the empirical falsification principle \citep{Popper1959} over an arbitrarily wide envelope of possible inputs reduces the subjectivity and uncertainty in a case study where the available data was not adequate \citep{Bevilacqua2019}.
%{\bf NEED to MOTIVATE WITH SYSTEM COMPLEXITY and MANY MODELS?}

There are usually numerous models for representing complex systems with sparse observations, like large scale geophysical mass flows \cite{Kelfoun2011}. It is often difficult to decide which of these models are appropriate for a particular analysis. Nevertheless, ready availability of many models as reusable software tools makes it the user's burden to select one appropriate for their purpose.
For example, the $\mathrm{4^{\mathrm{th}}}$ release of TITAN2D\footnote{available from vhub.org}, already adopted in \citep{Bevilacqua2019}, offers multiple rheology options in the same code base \citep{Simakov2019}. The availability of different models for similar or the same phenomena in the same tool provides us the ability to directly compare outputs and internal variables in all the models while controlling for difficult to quantify effects like numerical solution procedures, input ranges and computer hardware. This can improve the process for integrating information from multiple models \citep{Bongard2007}. Given a particular problem for which predictive analysis is planned, the information generated and its comparison to available observational data can be used to guide model choice and input space refinement \citep{Bevilacqua2019}. However, as we have discovered, such comparison requires a careful understanding of each model and its constituents and a well-organized process like that which we describe here for such a comparison. In this study, we focus on the comparison between models, more than on the input space refinement problem. %We begin with precise, albeit limited definitions of models and their constituents.

%This paper presents a systematic approach to the study of such models of complex systems, motivated by and applied to the case of geophysical mass flow models.
%In this study we describe a methodology for characterizing geophysical flow models and the modeling assumptions they represent.
A modeling assumption is essentially a simple postulate framing direct relationships among quantities under study. Models are compositions of many such assumptions. The study of models is, thus, a study of these assumptions and their composability and applicability. For complex systems good models may contains needless assumptions that may be removed or a good assumption could greatly improve a different model. In practice, these are usually subjective choices, not data driven. Moreover, assumptions needed may change as the system evolves, making {\em model choice} more difficult \citep{Patra2018b}.

In this study, we analyze the general features that differ among models  in a probabilistic framework, oriented to extrapolation and forecast. More significantly, we describe and compare them using newly introduced  concepts of dominance factors and expected contributions. This type of analysis, enabled by our approach, allows us to quantitatively evaluate modeling assumptions and their relative importance.

\section{Methods and Data Sources}
The statistically driven method introduced in this study for analyzing complex models provides extensive and quantitative information. Geophysical flow modeling usually compares simulation to observation, and fits the model parameters using the solution of a regularized inverse problem. Nevertheless, this is not always sufficient to solve forecasting problems, in which the range of possible flows might not be limited to a single type and scale of flow. Our approach is different, and evaluates the statistics of a range of flows, produced by the couple $\left(M, \mathcal P_M\right)$ - i.e. a model $M$ and a probability distribution for its parameters $\mathcal P_M$. New quantitative information can solve classical qualitative problems, either model-model or model-observation comparison. The mean plot represents the average behavior of flows in the considered range, and provides the same type of output that is provided by a single simulation. Moreover, the uncertainty ranges generate additional pieces of information that often highlight the differences between the models.


\subsection{Analysis Process}
Following the approach in \cite{Patra2018b, Bevilacqua2019}, we define $\left(M(A),\mathcal P_{M(A)}\right)$, where $A$ is a set of assumptions, $M(A)$ is the model composed of those assumptions, and $\mathcal P_M$ is a probability measure in the input space of $M$. In this study we are considering a measure $\mathcal P_M$ defined through a prior choice of the input domain. In general, $\mathcal P_M$ could be obtained after a calibration step, or become a single value after the solution of an inverse problem for the optimal reconstruction of a particular flow. This could reduce the predictive capabilities of the model, where we have to investigate the outcomes over a nontrivial and relatively wide input space. However, our approach can be easily implemented after a more careful input space refinement (e.g. \cite{Bevilacqua2019}.

Our problem cannot be solved using classical sensitivity analysis (e.g. \cite{Saltelli2010}, \cite{Weirs2012}), which decomposes the variance of model output with respect to the input parameters. Indeed, model assumptions cannot be seen as input parameters, because they are related to the terms in the governing equations. These terms can be seen as random variables depending on the inputs, but they have an unknown probability distribution and are not independent. In the sequel we will define the new concepts of dominance factors and expected contributions to cope with this problem. %These and other statistics can now be compared to determine the need for different modeling assumptions and the relative merits of different models. Thus, analysis of the output data gathered over the entire range of flows for the state variables and outcomes leads to a quantitative basis for accepting or rejecting particular assumptions or models for specific outcomes.
Essentially, if the assumptions represent the atomic elements constituting the models, then the dominance factors will tell us which atomic element is the \emph{most important} in the model, and the contributing variables will quantitatively determine the full atomic decomposition of the model, through space and time.

%\cite{Gilbert91} defines science as a process of constructing predictive conceptual models where these models represent
%consistent predictive relations in target systems. A simpler definition of a model that is more appropriate for the context of this study is that:
%
%\begin{quote}{\it A model is a representation of a postulated relationship among inputs and outputs of a system, informed by observation, and based on an hypothesis that best explains the relationship.}\end{quote} The definition captures two of the most important characteristics
%\begin{itemize}
%\item models depend on a {\it hypothesis}, and,
%\item models use the {\it data from observation} to validate and refine the hypothesis.
%\end{itemize}
%Errors and uncertainty in the data and limitations in the hypothesis (usually a tractable and computable mathematical construct articulating beliefs like proportionality, linearity, etc.) are immediate challenges that must be overcome, to construct useful and credible models.

The investigation of contributing variables illustrates the impact of modeling assumptions. Furthermore, understanding which assumptions dominate, and by how much, is a key step towards enabling the construction of more efficient models for desired inputs. %Such model composition has the ultimate purpose of bypassing the search for a unique best model, and going beyond a simple mixture of alternative models.
We summarize our analysis process in two steps.
%
\paragraph{Stage 1: Setup of Parameter Ranges} In this study, we assume:
$$\mathcal P_M\left(p_1,\dots,p_{N_M}\right)\sim \bigotimes_{i=1}^{N_M} Unif(a_{i,M},b_{i,M}),$$
where $N_M$ is the number of parameters of $M$. This is not restrictive, and in case of correlation between the commonly used parameters $(\hat p_j)_{j=1,\hat N_M}$, or non-uniform distributions, we can always define a function $g$ such that $g[(p_i)_{j=1, N_M}]=(\hat p_j)_{j=1,\hat N_M}$, and the $(p_i)$ are independent and uniformly distributed. In particular, we choose these parameter ranges trough an explorative testing for physical consistency of model outcomes and range of inputs/outcomes of interest, and using information collected from the literature. {This step is critical, because if the statistical comparison is dominated by trivial macroscopic differences, it cannot focus on the rheology details.}  In the preparation of hazard analysis, expert elicitation processes can be used to ensure that the studies correctly account for all anticipated and possible flow regimes.
%

\paragraph{Stage 2 Simulations and Output Data Gathering}
For each model $M$, we produce datasets of \emph{model inputs}, \emph{contributing variables} and \emph{model outputs}. These concepts are introduced in \cite{Patra2018b}, and briefly reported in \cite{Bevilacqua2019}. The \emph{model outputs} could be flow height, lateral extent, area, velocity, acceleration, and derived quantities such as Froude number $Fr$. In general they include any explicit outcome of the flow calculations. Instead, the \emph{contributing variables} include quantities that are related to individual assumptions $A_i$, typically not observed in the outputs of the model. For example these could be values of different source terms in momentum balances of complex flow calculations, or dissipation terms, or inertia terms. We base our analysis on a Monte Carlo simulation, sampling the model inputs and producing a family of graphs for the expectation of the contributing variables and model outputs. We also include their 5$^{\mathrm{th}}$ and 95$^{\mathrm{th}}$ percentiles. The sampling technique of the input variables follows \cite{Patra2018b}. It is based on the Latin Hypercube Sampling idea \citep{McKay1979,Owen1992b,Stein1987,Ranjan2014,Mingyao2016}, and in particular, on the improved space-filling properties of the orthogonal array-based Latin Hypercubes \citep{Owen1992a,Tang1993}. The volume of output data generated is likely to be large but modern computing and data handling equipment readily available to most modeling researchers \footnote{We thank the University at Buffalo Center for Computing Research} in university and national research facilities are more than adequate.

\subsection{Monte Carlo Process and Statistical Analysis}
In this study, like in \cite{Bevilacqua2019}, the flow range is defined by establishing boundaries for inputs like flow volume or rheology coefficients characterizing the models.  Latin Hypercube Sampling is performed over $[0,1]^d$ where $d$ depends on the number of  input parameters. Those scale-less samples are thus linearly  transformed to span the required intervals. Section \ref{lhs_des_colima} provides examples of Latin Hypercube design in the three models that are targets of this study, with respect to their commonly used parameters.

Through the Monte Carlo simulation, we calculate data for each sample run and each output or contributing variable $f(\underline{\textbf x},t)$ described as a function of time on the elements of the computational grid. This analysis produces very large volume of data which then has to be processed utilizing statistical methods for summative impact. %The contributing variables in this case are the mass and force terms in the conservation laws defined above.

We devise many statistical measures for analyzing the data, following the definitions in \cite{Patra2018b} reported in this study for the sake of completeness. In particular, let $(F_i(\underline{\textbf x},t))_{i=1,\dots, k}$ be an array of force terms, where $\underline{\textbf x}\in \mathbf R^d$ is a spatial location, and $t\in T$ is a time instant. The degree of contribution of those force terms to the flow dynamics can be significantly variable in space and time, and we define the \emph{dominance factors} $[P_j(\underline{\textbf x},t)]_{j=1,\dots, k}$, i.e., the probability of each $F_j(\underline{\textbf x},t)$ to be the dominant force. Those probabilities provide insight into the dominance of a particular source or dissipation term on the model dynamics. We remark that we focus on the modulus of the forces and hence we cope with scalar terms. It is also important to remark that all the forces depend on the input variables, and they can be thus considered as random variables. Furthermore, these definitions are general and could be applied to any set of contributing variables, and not only to the force terms.

\begin{definition}[Dominance factors]
Let $(F_i)_{i=1,\dots, k}$ be random variables on $(\Omega, \mathcal F,\mathcal P_M)$. Then, $\forall i$, the dominant variable is defined as:
$$\Phi:=\max_i |F_i|.$$
In particular, for each $j =1,\dots, k$, the dominance factors are defined as:
$$P_j:=\mathcal P_M\left\{\Phi=|F_j|\right\}.$$
\end{definition}

We remark that the dominant variable $\Phi$ is also a random variable, and in particular it is a stochastic process parameterized in time and space. Moreover, we define the \emph{random contributions}, an additional tool that we use to compare the different force terms, following a less restrictive approach than the dominance factors. They are obtained dividing the force terms by the dominant force $\Phi$, and hence belong to $[0,1]$.

\begin{definition}[Expected contributions]
Let $(F_i)_{i=1,\dots, k}$ be random variables on $(\Omega, \mathcal F, \mathcal P_M)$. Then, $\forall i$, the random contribution is defined as:
$$C_i:=\left\{
\begin{array}{ll}
      \frac{F_i}{\Phi}, & \textrm{if }\Phi\neq 0; \\
      0, & \hbox{otherwise.}
    \end{array}
  \right.$$
where $\Phi$ is the dominant variable. Thus, $\forall i$, the expected contributions are defined by $E\left[C_i\right]$.
\end{definition}

Thus, for a particular location $x$, time $t$, and parameter sample $\omega$, we have $C_i(\underline{\textbf x},t,\omega)=0$ if there is no flow or all the forces are null. The expectation of $C_i$ is reduced by the chance of $F_i$ being small compared to the other terms, or by the chance of having no flow in $(\underline{\textbf x},t)$. Expected contributions are obtained after diving the force terms by the dominant variable $\Phi$, which is an unknown quantity depending on time, location, and input parameters. We provide an additional result, further explaining the meaning of those contributions through the conditional expectation. The proof is an easy consequence of the rule of chain expectation.

\begin{proposition}
Let $(F_i)_{i=1,\dots, k}$ be random variables on $(\Omega, \mathcal F, \mathcal P)$. For each $i$, let $C_i$ be the random contribution of $F_i$. Then we have the following expression:
$$E[C_i]=\sum_j P_j \mathbb E\left[\frac{F_i}{|F_j|}\ \Big{|}\ \Phi=|F_j|\right],$$
where $P_j:=\mathcal P\left\{\Phi=|F_j|\right\}$.
\end{proposition}


\subsection{Modeling of Geophysical Mass Flows}\label{subsec:FlowTypes}
Models that are computationally tractable are rarely able to capture the physics of the complex flow class of large scale dense granular avalanches. In addition, very often the only actual information available is the a posteriori deposit left by the flow, with sparse data and significant uncertainty affecting the mechanisms of flow initiation and propagation. This modeling task is challenging and the subject of continuing research. Due to intrinsic mathematical, physical, or numerical issues, models that appear to reproduce well the flows in certain conditions, may turn out to be poor in others. However, because of the high consequences of such flows, several models composed of different assumptions have been proposed. For example in \citep{Iverson1997, Iverson2001, Denlinger2001, Pitman2003a, Denlinger2004, Iverson2004,dMichieliVitturi2019}, the depth-averaged model was applied in the simulation of test geophysical flows in large scale experiments. Several studies were specifically devoted to the modeling of volcanic mass flows \citep{Bursik2005,Kelfoun2005,Macias2008,Kelfoun2009,Charbonnier2013}. In fact, volcanos are great sources for a rich variety of geophysical flow types and provide field data from past flow events.

We first assume that the laws of mass and momentum conservation hold for properly defined system boundaries. After this, because these flows are typically very long and wide with small depth we assume their {\it shallowness} \citep{SavageHutter1989}. This enables the calculation of simpler and more computationally tractable equations, by integrating through the flow depth. Both these assumptions, conservation laws and shallowness, could be investigated with the procedure defined above. Nevertheless, there is much evidence in the literature for their validity and we do not test them in this study.

The depth-averaged Saint-Venant equations that result are:
\begin{eqnarray}
\label{eq:D_A}
\frac{\partial h}{\partial t} +
\frac{\partial}{\partial x}(h \bar{u}) +
\frac{\partial}{\partial y}(h\bar{v}) &=& 0 \nonumber \\
\frac{\partial}{\partial t} (h\bar{u}) +
\frac{\partial}{\partial x}\left(h\bar{u}^2 + \frac{1}{2}k g_{z}h^2\right) + \frac{\partial}{\partial y}(h\bar{u}\bar{v}) &=& S_{x}\\
\frac{\partial}{\partial t} (h\bar{v}) +
\frac{\partial}{\partial x}(h\bar{u}\bar{v}) +
\frac{\partial}{\partial y}\left(h\bar{v}^2 + \frac{1}{2}k g_{z}h^2\right) &=& S_{y} \nonumber
\end{eqnarray}
Like in \cite{Patra2005}, here the Cartesian coordinate system is aligned such that $z$ is normal to the surface; $h$ is the flow height in the $z$ direction; $h\bar{u}$ and $h\bar{v}$ are respectively the components of momentum in the $x$ and $y$ directions; and the earth pressure coefficient $k$ relates the lateral stress components, $\bar{\sigma}_{xx}$ and $\bar{\sigma}_{yy}$, to the normal stress component, $\bar{\sigma}_{zz}$. %The definition of this coefficient depends on the constitutive model of the flowing material we choose.
We remark that $\frac{1}{2} k g_z h^2$ is the contribution of depth-averaged pressure to the momentum fluxes. $S_x$ and $S_y$ are the sum local stresses: as described in \cite{Patra2018b} they include the gravitational driving forces, the basal friction force resisting to the motion of the material, and additional forces specific of rheology assumptions.

The class of assumptions that we specifically test in this study are the assumptions on the rheology of the flows. In particular they model the different dissipation mechanisms embedded in $S_x, S_y$, and that cause an abundance of models with much controversy on the most suitable model. %We focus here on three models composed of different assumptions for essentially the same class of flows.

We will define our approach and illustrate it using three models for large scale mass flows incorporated in our large scale mass flow simulation framework  TITAN2D \citep{Patra2005,Patra2006, Yu2009, Aghakhani2016}. The description of the models is summarized from \cite{Bevilacqua2019} and reported in this study for the sake of clarity.
So far, TITAN2D has been successfully applied to the simulation of different geophysical mass flows with specific characteristics \citep{Sheridan2005, Rupp2006, Norini2008, Charbonnier2009, Procter2010, Sheridan2010, Sulpizio2010, Capra2011}. Several studies involving TITAN2D were also directed towards a statistical study of geophysical flows, focusing on uncertainty quantification \citep{Dalbey2008, Dalbey2009, Stefanescu2012b, Stefanescu2012a}, or on the more efficient production of hazard maps \citep{Bayarri2009, Spiller2014, Bayarri2015, Ogburn2016, Tierz2018, Bevilacqua2019, Hyman2019, Rutarindwa2019}.

In the three following sections, we summarize \emph{Mohr-Coulomb} (MC), \emph{Pouliquen-Forterre} (PF) and \emph{Voellmy-Salm} (VS) models. Models based on additional heterogeneous assumptions are possible, either more complex \citep{PitmanLe2005,Iverson2014} or more simple \citep{DadeHuppert1998}. We decided to focus on these three because of their popularity. Moreover, if the degree of complexity in the models is significantly different, model comparison should take that into account, but this is outside the purpose of this study \citep{Farrell2015}.

\subsubsection{Mohr-Coulomb model}\label{MCM}
Based on the long history of studies in soil mechanics \citep{Rankine1857,DruckerPage52}, the Mohr-Coulomb rheology (MC) was developed and used to represent the behavior of geophysical mass flows \citep{SavageHutter1989}.

Shear and normal stress are assumed to obey Coulomb friction equation, both within the flow and at its boundaries. In other words,
\begin{equation}
\tau = \sigma \tan \phi,
\end{equation}
where $\tau$ and $\sigma$ are respectively the shear and normal stresses on failure surfaces, and $\phi$ is a friction angle. This relationship does not depend on the flow speed.

We can summarize the MC rheology assumptions as:
\begin{itemize}
\item \textit{Basal Friction} based on a constant friction angle.

\item \textit{Internal Friction} based on a constant friction angle.

\item \textit{Earth pressure coefficient} formula depends on the Mohr circle (it can be $0$ or $\pm 1$).

\item Velocity based \textit{curvature effects} are included into the equations.
\end{itemize}

%Under the assumption of symmetry of the stress tensor with respect to the \textit{z} axis, the earth pressure coefficient $k=k_{ap}$ can take on only one of three values $\{ 0, \pm 1\}$. The material yield criterion is represented by the two straight lines at angles $\pm \phi$ (the internal friction angle) relative to horizontal direction. Similarly, the normal and shear stress at the bed are represented by the line $\tau=-\sigma \tan(\delta)$ where $\delta$ is the bed friction angle.

\paragraph{MC equations} As a result, we can write down the source terms of the Eqs. (\ref{eq:D_A}):
\begin{eqnarray}\label{S_terms_MC}
S_x =& g_x h  - \frac{\bar{u}}{\| \underset{^\sim}{\bar{\textbf u}} \|} \left[h\left(g_z+\frac{\bar{u}^2}{r_x}\right)\tan(\phi_{bed})\right] - h k_{ap} \ {\rm sgn}\left(\frac{\partial \bar{u}}{\partial y}\right) \frac{\partial (g_z h)}{\partial y} \sin(\phi_{int}) \nonumber \\
 S_y =& g_y h  - \frac{\bar{v}}{\| \underset{^\sim}{\bar{\textbf u}} \|} \left[h\left(g_z +\frac{\bar{v}^2}{r_y}\right)\tan(\phi_{bed})\right] - h k_{ap} \ {\rm sgn}\left({\frac{\partial \bar{v}}{\partial x}}\right) \frac{\partial (g_z h)}{\partial x} \sin(\phi_{int})
\end{eqnarray}
Where, $\underset{^\sim}{\bar{\textbf u}} = (\bar{u} , \bar{v})$, is the depth-averaged velocity vector, $r_x$ and $r_y$ denote the radii of curvature
of the local basal surface. The inverse of the radii of curvature is usually approximated with the partial derivatives of the basal slope, e.g., $1/r_x = \partial \theta_x/\partial x$, where $\theta_x$ is the local bed slope.

In our study, sampled input parameters are $\phi_{bed}$, and $\Delta \phi:=\phi_{int}-\phi_{bed}$. In particular, we assumed $\Delta \phi \in [2^{\mathrm{\circ}}, 10^{\mathrm{\circ}}]$ \citep{Dalbey2008}.

\subsubsection{Pouliquen-Forterre model}\label{PFM}
The scaling properties for granular flows down rough inclined planes led to the development of the Pouliquen-Forterre rheology (PF), assuming a variable frictional behavior as a function of Froude Number and flow depth \citep{Pouliquen1999, ForterrePouliquen2002, PouliquenForterre2002, ForterrePouliquen2003}.

PF rheology assumptions can be summarized as:
\begin{itemize}
\item \textit{Basal Friction} is based on an interpolation of two different friction angles, based on the flow regime and depth.

\item \textit{Internal Friction} is neglected.

\item \textit{Earth pressure coefficient} is equal to one.

\item Normal stress is modified by a \textit{pressure force} linked to the thickness gradient.

\item Velocity based \textit{curvature effects} are included into the equations.
\end{itemize}

%Two critical slope inclination angles are defined as functions of the flow thickness, namely $\phi_{start}(h)$ and $\phi_{stop}(h)$. The function $\phi_{stop}(h)$ gives the slope angle at which a steady uniform flow leaves a deposit of thickness $h$, while $\phi_{start}(h)$ is the angle at which a layer of thickness $h$ is mobilized. They define two different basal friction coefficients.
%\begin{eqnarray}
%\mu_{start}(h)=\tan(\phi_{start}(h))\\
%\mu_{stop}(h)=\tan(\phi_{stop}(h))
%\end{eqnarray}

An empirical friction law $\mu_{b}(\|\underset{^\sim}{\bar{\textbf{u}}} \| , h)$ is defined in the whole range of velocity and thickness. The expression changes depending on two flow regimes, according to a parameter $\beta$ and the Froude number $Fr=\| \underset{^\sim}{\bar{\textbf{u}}} \| / \ \sqrt{h g_{z}}$. The critical angles $\phi_{1}$ and $\phi_{2}$, and the quantities $\mathcal{L}, \beta$ are the parameters of the model. In particular, $\mathcal{L}$ is the characteristic depth of the flow over which a transition between the angles $\phi_{1}$ to $\phi_{2}$ occurs. More details can be found in \cite{Bevilacqua2019}. %In practice, if $h\ll \mathcal L$, then $\mu_{stop}(h)\approx \tan\phi_{2}$, and if $h\gg \mathcal L$, then $\mu_{stop}(h)\approx\tan\phi_{1}$.

%\paragraph{Dynamic friction regime - $Fr \ge \beta$}
%\begin{equation}\label{mu_beta1}
%\mu(h, Fr)=\mu_{stop}(h \beta / Fr)
%\end{equation}

%\paragraph{Intermediate friction regime - $0 \le Fr < \beta$}
%\begin{equation}\label{mu_beta2}
%\mu(h, Fr)=\left(\frac{Fr}{\beta}\right)^\gamma [\mu_{stop}(h)-\mu_{start}(h)] + \mu_{start}(h),
%\end{equation}
%where $\gamma$ is the power of extrapolation, assumed equal to $10^{-3}$ in the sequel \citep{PouliquenForterre2002}.
%
%The functions $\mu_{stop}$ and $\mu_{start}$ are defined by:
%\begin{equation}\label{mu-stop}
%\mu_{stop}(h)=\tan\phi_{1} + \frac{\tan\phi_{2}-\tan\phi_{1}}{1+h/\it \mathcal{L}}
%\end{equation}
%and
%\begin{equation}\label{mu-start}
%\mu_{start}(h)=\tan\phi_{3} + \frac{\tan\phi_{2}-\tan\phi_{1}}{1+h/\it \mathcal{L}}
%\end{equation}

\paragraph{PF equations} The depth-averaged Eqs. (\ref{eq:D_A}) source terms thus take the following form:
\begin{eqnarray}\label{eq:S_terms_PF}
S_{x} &=&  g_{x} h -  \frac{\bar{u}}{\| \underset{^\sim}{\bar{\textbf{u}}} \|}\left[h \left(g_z+\frac{\bar{u}^2}{r_x}\right) \ \mu_{b}(\|\underset{^\sim}{\bar{\textbf{u}}} \| , h)\right] \ + g_{z}h\frac{\partial h}{\partial x} \nonumber \\
S_{y} &=&  g_{y} h - \frac{\bar{v}}{\| \underset{^\sim}{\bar{\textbf{u}}} \|}\left[h \left(g_z +\frac{\bar{v}^2}{r_y}\right) \ \mu_{b}(\|\underset{^\sim}{\bar{\textbf{u}}} \| , h)\right] \ + g_{z}h\frac{\partial h}{\partial y}
\end{eqnarray}

In our study, sampled input parameters are $\phi_1$, $\Delta \phi_{12}:=\phi_2-\phi_1$, and $\beta$. In particular, we assumed $\Delta \phi_{12} \in [10^{\mathrm{\circ}}, 15^{\mathrm{\circ}}]$, and $\beta \in [0.1, 0.85]$. Moreover, %$\phi_3=\phi_1+1^\mathrm{\circ}$, and
$\mathcal{L}$ is equal to $1 dm$ \citep{PouliquenForterre2002,ForterrePouliquen2003}.

\subsubsection{Voellmy-Salm model}\label{VSM}
The theoretical analysis of dense snow avalanches led to the VS rheology (VS) \citep{Voellmy1955, Salm1990, Salm1993, Bartelt1999}. Dense snow or debris avalanches consist of mobilized, rapidly flowing ice-snow mixed to debris-rock granules \citep{BarteltMcArdell2009}. The VS rheology assumes a velocity dependent resisting term in addition to the traditional basal friction, ideally capable of including an approximation of the turbulence-generated dissipation. Many experimental and theoretical studies were developed in this framework \citep{Gruber2007, Kern2009, Christen2010, Fischer2012}.

The following relation between shear and normal stresses holds:
\begin{equation}
\tau = \mu \sigma + \frac{\rho \| \underline{\textbf g} \|}{\xi} \| \underset{^\sim}{\bar{\textbf u}} \|^2,
\end{equation}
where, $\sigma$ denotes the normal stress at the bottom of the fluid layer and $\underline{\textbf g} = (g_{x} , g_{y} , g_{z})$ represents the gravity vector. The two parameters of the model are the bed friction coefficient $\mu$ and the velocity dependent friction coefficient $\xi$.

We can summarize VS rheology assumptions as:
\begin{itemize}
\item \textit{Basal Friction} is based on a constant coefficient, similarly to the MC rheology.

\item \textit{Internal Friction} is neglected.

\item \textit{Earth pressure coefficient} is equal to one.

\item Additional \textit{turbulent friction} is based on the local velocity by a quadratic expression.

\item Velocity based \textit{curvature effects} are included into the equations, following a different formulation from the previous models.
\end{itemize}

%The effect of the topographic local curvatures is addressed with terms containing the local radii of curvature $r_x$ and $r_y$. In this case the expression is based on the speed instead of the scalar components of velocity \citep{PudasainiHutter2003,Fischer2012}.

\paragraph{VS equations} Therefore, the final source terms take the following form:
\begin{eqnarray}
\label{eq:S_terms_VS}
S_{x} &=&  g_{x} h - \frac{\bar{u}}{\| \underset{^\sim}{\bar{\textbf u}}\|} \ \left[ h \left(g_{z} + \frac{\| \underset{^\sim}{\bar{\textbf u}} \|^2}{r_{x}} \right)\mu+ \frac{\| \underset{^\sim}{\textbf g} \|}{\xi}\| \underset{^\sim}{\bar{\textbf u}} \|^2\right], \nonumber \\
S_{y} &=& g_{y} h - \frac{\bar{v}}{\| \underset{^\sim}{\bar{\textbf u}}\|} \ \left[ h \left(g_{z} + \frac{\| \underset{^\sim}{\bar{\textbf u}} \|^2}{r_{y}} \right)\mu+ \frac{\| \underset{^\sim}{\textbf g} \|}{\xi}\| \underset{^\sim}{\bar{\textbf u}} \|^2\right].
\end{eqnarray}

In our study, sampled input parameters are $\mu$, and $\xi$. In particular, $\xi$ uniform sampling is accomplished in log-scale. In fact, values of $\xi$ between 250 and 4,000 $m/s^2$ have been described for snow avalanches \citep{Salm1993,Bartelt1999,Gruber2007}.

\subsection{Contributing Variables}\label{sec:Fterms}
For analysis of modeling assumptions we need to record and classify the results of different modeling assumptions. In our case study, we focus on the right-hand side terms in the momentum equation and we call them RHS forces, or, more simply, the force terms. They are internal to the computation and rarely visible as a system output.
\begin{align}
\boldsymbol{RHS_1} = [g_x h,g_y h],
\end{align}
it is the gravitational force term, it has the same formulation in all models.

The expression of {\bf basal friction force} $\boldsymbol{RHS_2}$ depends on the model:
\begin{align}
\boldsymbol{RHS_2} =& -h g_z\tan(\phi_{bed})\left[\frac{\bar{u}}{\| \underset{^\sim}{\bar{\textbf u}} \|}, \frac{\bar{v}}{\| \underset{^\sim}{\bar{\textbf u}} \|} \right],\textmd{ in MC model.}\nonumber\\
\boldsymbol{RHS_2} =& - h g_z \ \mu_{b}(\|\underset{^\sim}{\bar{\textbf{u}}} \| , h)\left[\frac{\bar{u}}{\| \underset{^\sim}{\bar{\textbf{u}}} \|}, \frac{\bar{v}}{\| \underset{^\sim}{\bar{\textbf{u}}} \|}\right],\textmd{ in PF model.}\\
\boldsymbol{RHS_2} =& -h g_{z} \mu\left[\frac{\bar{u}}{\| \underset{^\sim}{\bar{\textbf u}}\|} , \frac{\bar{v}}{\| \underset{^\sim}{\bar{\textbf u}}\|}\right],\textmd{ in VS model.}\nonumber
\end{align}

The expression of the force related to the {\bf topography curvature}, $\boldsymbol{RHS_3}$, also depends on the model:
\begin{align}
\boldsymbol{RHS_3} =&-h \tan(\phi_{bed})\left[\frac{\bar{u}^3}{r_x\| \underset{^\sim}{\bar{\textbf{u}}} \|}, \frac{\bar{v}^3}{r_y\| \underset{^\sim}{\bar{\textbf{u}}} \|}\right],\textmd{ in MC model.}\nonumber\\
\boldsymbol{RHS_3} =& -h\ \mu_{b}(\|\underset{^\sim}{\bar{\textbf{u}}} \|,h)\left[\frac{\bar{u}^3}{r_x\| \underset{^\sim}{\bar{\textbf{u}}} \|}, \frac{\bar{v}^3}{r_y\| \underset{^\sim}{\bar{\textbf{u}}} \|}\right],\textmd{ in PF model.}\\
\boldsymbol{RHS_3} =& -h \mu\left[\frac{\bar{u}\| \underset{^\sim}{\bar{\textbf u}} \|}{r_{x}},\frac{\bar{v}\| \underset{^\sim}{\bar{\textbf u}} \|}{r_{y}}\right],\textmd{ in VS model.}\nonumber
\end{align}

All the three models have an additional force term, having a different expressions and different meaning in the three models:
\begin{align}
\boldsymbol{RHS_4} =&  - h k_{ap}\sin(\phi_{int})\left[ \ {\rm sgn}(\frac{\partial \bar{u}}{\partial y}) \frac{\partial (g_z h)}{\partial y},\ {\rm sgn}({\frac{\partial \bar{v}}{\partial x}}) \frac{\partial (g_z h)}{\partial x}\right],\textmd{ in MC model.}\nonumber\\
\boldsymbol{RHS_4} =& g_{z}h\left[\frac{\partial h}{\partial x}, \frac{\partial h}{\partial y}\right],\textmd{ in PF model.}\\
\boldsymbol{RHS_4} =& -\frac{\| \underset{^\sim}{\textbf g} \|}{\xi}\| \underset{^\sim}{\bar{\textbf u}} \|^2\left[\frac{\bar{u}}{\| \underset{^\sim}{\bar{\textbf u}}\|} \ ,\frac{\bar{v}}{\| \underset{^\sim}{\bar{\textbf u}}\|}\right],\textmd{ in VS model.}\nonumber
\end{align}
These contributing variables can be analyzed locally and globally for discriminating among the different modeling assumptions. We remark that a complete representation of the model functional should include also the left hand side (LHS) terms. For instance, the lateral stress component $k_{ap}$ could be influenced by bed and internal frictional coefficients, and it appears in the LHS terms \citep{Gray1999,Pirulli2007}. Our statistical approach could be easily extended to the LHS terms. We did not focus on them for the sake of simplicity.

Finally, we also study the spatial integrals defined by $F(t)=\int_{\mathbb R^k}f(\underline{\textbf x},t) d\underline{\textbf x}$, where $d\underline{\textbf x}$ is the area of the mesh elements. This provides a global view of the results and is complementary to the observations taken locally. For instance, by integrating the scalar product of source terms in the momentum balance and velocity we can compare the relative importance of modeling assumptions when we seek accuracy on global quantities.

\section{Case Study and Results}\label{QoI2}
Our case study is a pyroclastic flow down the SW slope of Volc{\'a}n de Colima (MX) - an andesitic stratovolcano that rises to 3,860 m above sea level, situated in the western portion of the Trans-Mexican Volcanic Belt (Fig. \ref{fig:Colima-first}). Volc{\'a}n de Colima has historically been the most active volcano in M{\'e}xico \citep{DeLaCruzReina1993, Zobin2002, Gonzalez2002}.

Pyroclastic flows generated by explosive eruptions and lava dome collapses of Volc{\'a}n de Colima are a well studied topic \citep{DelPozzo1995,Sheridan1995,Saucedo2002,Saucedo2004,Saucedo2005,Sarocchi2011,Capra2015}. The presence of a change in slope and multiple ravines characterize the SW slope of the volcano. Volc{\'a}n de Colima has been used as a case study in several research papers involving the Titan2D code \citep{Rupp2004, Rupp2006, Dalbey2008, Yu2009, Sulpizio2010, Capra2011, Aghakhani2016}. On July 10$^{\mathrm{th}}$-11$^{\mathrm{th}}$, 2015, the volcano underwent its most intense eruptive phase since its Subplinian-Plinian 1913 AD eruption \citep{Saucedo2010, Zobin2015, ReyesDaVilla2016, Capra2016, Macorps2018}. We assume the flow to be generated by the gravitational collapse of a lava dome represented by a material pile placed close to the summit area - at 644956N, 2157970E UTM13 \citep{Rupp2006,Aghakhani2016}. A lava dome collapse occurs when there is a significant amount of recently-extruded highly-viscous lava piled up in an unstable configuration. Further extrusion and/or externals forces can cause the still hot dome of viscous lava to collapse, disintegrate, and avalanche downhill \citep{Bursik2005, Wolpert2016, Hyman2018}. The volcano produced several pyroclastic flows of this type, called Merapi style flows \citep{Macorps2018}. The hot, dense blocks in this ``block and ash'' flow (BAF) will typically range from centimeters to a few meters in size.

The rheology of volcanic rock avalanches and dense pyroclastic flows is complex, and it is difficult to constrain the physics of the processes. A priori predictive ability of the known block and ash flow models is limited by inability to tune without knowledge of flow character \citep{Patra2005}. Since these dense flows are constituted of blocks, ash and gas, friction between the particles during emplacement could confer a Coulomb behavior to the whole flow. So, the Mohr Coulomb model was often considered appropriate, sometimes associated with a velocity or volume-dependent term \citep{Spiller2014}. However, a simple Coulomb behavior is not ideal, whatever the value of friction angle used. A friction angle that varies according to the velocity and thickness of the flows has also been assumed in the simulations of natural flows \citep{Kelfoun2011}. For these reason all three models, MC, VS and PF, are a priori appropriate but not ideal for a block and ash flow. Our new quantitative approach provides a tool in the evaluation of what model best characterizes the flow.

Our computations were performed on a DEM of 5m-pixel resolution, obtained from Laser Imaging Detection and Ranging (LIDAR) data acquired in 2005 \citep{Davila2007, Sulpizio2010}. We placed 51 locations along the flow inundated area to accomplish local testing. After evaluating the results in all the locations, six of them are adopted as preferred locations, being representative of different flow regimes.
\begin{figure}[H]
    \includegraphics[width=0.90\textwidth]{ColimaFig.jpg}
    \centering
    \caption{(a) Volc{\'a}n de Colima (M{\'e}xico) overview, including 51 numbered locations (stars) and major ravines. Initial pile is marked by a blue dot. Coordinates are in UTM zone 13N. (b) Regional geology map. (c) Digital elevation map. Six preferred locations are colored in yellow. Elevation isolines are displayed in blue, elevation values in red.}
    \label{fig:Colima-first}
\end{figure}

\subsection{Preliminary Consistency Testing of the Input Ranges}
In this same setting, \cite{Dalbey2008} assumed $\phi_{bed}=[15^\mathrm{\circ}, 35^\mathrm{\circ}]$, while \cite{Capra2011} adopted $\phi_{bed}=30^\mathrm{\circ}$. Then, \cite{Spiller2014,Bayarri2015,Ogburn2016} found a statistical correlation between flow size and effective basal friction inferred from field observation of geophysical flows. A BAF at the scale of our simulations would possess $\phi_{bed}=[13^\mathrm{\circ}, 18^\mathrm{\circ}]$ according to their estimates. Small changes in the parameter ranges do not change significantly the results.

Figure \ref{Colima-MaxMinExtents} displays the maps of maximum flow height observed in the extreme cases tested. Simulation options are - max\_time = 7200 s (2 hours), height/radius = 0.55, length\_scale = 4 km, number\_of\_cells\_across\_axis = 50, order = first, geoflow\_tiny = $1.0\times 10^{-4}$ \citep{Patra2005,Aghakhani2016}. Initial pile geometry is paraboloid. Even if the maximum runout is matched between the models, they display significantly different macroscopic features. In particular, MC displays a further distal spread before entering the ravines, PF shows a larger angle of lateral spread at the initiation pile, and stops more gradually than MC with more complex inundated area boundary lines. VS is less laterally extended and the material reaches higher thickness. The flow generally looks significantly channelized, and displays several not-inundated spots due to minor topographical coul\'{e}es.

\begin{figure}[H]
         \centering
        \includegraphics[width=0.90\textwidth]{ExtremeMaps.jpg}
        \caption{Volc\'an de Colima - comparison between \emph{max flow height} maps of simulated flow, assuming MC (a),(d), PF (b),(e), and VS (c),(f) models. Extreme cases - (a),(b),(c) \emph{\textbf{max. volume -- min. resistance}} and (d),(e),(f) \emph{\textbf{min. volume -- max. resistance}}.}
        \label{Colima-MaxMinExtents}
\end{figure}

\begin{itemize}
\item \textbf{Material Volume:} $[2.08, 3.12] \times 10^5 \ m^3$, i.e. average of $2.6  \times 10^5 \ m^3$ and uncertainty of $\pm 20\%$.
\item \textbf{Rheology models' parameters:}
\par\noindent \textbf{MC} - $\phi_{bed} \in [10^{\mathrm{\circ}}, 25^{\mathrm{\circ}}]$.

\vskip.1cm\noindent \textbf{PF} - $\phi_1 \in [8^{\mathrm{\circ}}, 18^{\mathrm{\circ}}]$.

\vskip.1cm\noindent \textbf{VS} - $\mu \in [0.15, 0.45]$, $\quad \log(\xi) \in [1.7, 4]$.
\end{itemize}

We adopt a Latin Hypercube Sampling based on an orthogonal array $OA(s^d,d,s,d)$ \citep{Patra2018}. We take $s=8$ for the 3-dimensional designs over the parameter space of Mohr-Coulomb and Voellmy-Salm models, i.e. $512$ points; we took $s=6$ for the 4-dimensional designs over the more complex parameter space of the Pouliquen-Forterre model, i.e. 1296 points.

\subsection{Exploring Flow Limits} \label{lhs_des_colima}
Figure \ref{fig:Colima-CC1}a,b,c illustrates the sampling design of our simulations. Figure \ref{fig:Colima-CC1}d,e,f,g,h,i show examples of the contributions obtained assuming parameter values at the extremes of their range.
\begin{figure}[H]
         \centering
        \includegraphics[width=0.90\textwidth]{SensitivityFigure.jpg}
        \caption{(a-c) Example of Latin Hypercube Sampling design, Volc{\'a}n de Colima case study. Colored stars mark the values producing the minimum and maximum friction. The parameter values are projected with respect to their volume value. (d-i) Plots of random contributions at specific locations, obtained with (d,e) MC, (f,g) PF, and (h,i) VS model. Each plot includes two graphs. One refers to a proximal location to the initial pile (left), and the other to a more distal location (right). The dominant variable is expressed by the dots on the top line, $C_i=1$. Point numbers refer to Fig.\ref{fig:Colima-extra}. Different colors correspond to different force terms.}
        \label{fig:Colima-CC1}
\end{figure}
Data is inherently discontinuous due to the mesh modification, and it is reported with colored dots. If the mesh element which contains the considered spatial location changes, then the force term is calculated on a different region and suddenly changes too. This can also affect the dominant variable, and more than one random contribution can incorrectly appear to be at unity at the same time. However, it is evident that the dynamics and its temporal scale is evolving, and that the contributions can reveal a large amount of information about it. We remark that, $\forall i$, the calculation of $\mathbb E[C_i]$ with respect to $\mathcal P_M$ removes the effects of data discontinuity, and hence this is a fundamental step in our further analysis. We note that the above choices are easily changed, and if we are interested for instance in the performance of the models for very large or very small flows, a suitable volume range can be chosen and the procedure re-run.

\subsection{Observable Outputs}
The number of spatial locations is significantly high. We placed 51 points to span the entire inundated area, in search of different flow regimes, as displayed in Fig. \ref{fig:Colima-first}. These locations have an explorative purpose, whereas the six preferred locations will describe distinct flow regimes. We remark that all the distances reported in the following are measured in vertical projection, thus without considering the differences in elevation.

\begin{figure}[H]
         \centering
        \includegraphics[width=0.95\textwidth]{HeightMC_BAF.png}
        \caption{MC model, mean flow height $h(L,t)$ in 51 numbered locations (Fig. \ref{fig:Colima-first}). Different plots have different scales on either time and space axes.}
        \label{fig:BAF-H-MC}
\end{figure}
Figure \ref{fig:BAF-H-MC} shows the mean flow height, $h(L,t)$, at the 51 spatial locations of interest, according to MC. In plot \ref{fig:BAF-H-MC}a, the only location is set on the center of the initial pile. %and the profile is similar to what observed in point $L_1$ of the inclined plane case study, in Fig.\ref{fig:Ramp-H}a. In this case the height decreases from the initial value to zero in $\sim 15 s$. In plots \ref{fig:BAF-H-MC}b,c,d,e, the locations are are set at less than $\sim 1$ km radius from the initial pile. %Their profiles are similar to point $L_2$ in Fig.\ref{fig:Ramp-H}b.
The height profile is bell-shaped, starting from zero and then waning back to zero in $\sim 20$ s. All the dynamics occur during the first minute. In plots \ref{fig:BAF-H-MC}f,g,h,i,j, points are set where the slope reduces, and the flow can channelize, and typically leaves a deposit. The distance from the initial pile is $\sim 2-3$ km. %The profiles are sometimes similar to $L_3$ of Fig.\ref{fig:Ramp-H}c, other times to $L_4$ of Fig.\ref{fig:Ramp-H}d, in a few cases showing intermediate aspects.

\begin{figure}[H]
         \centering
        \includegraphics[width=0.85\textwidth]{FigExtra.jpg}
        \caption{Volc{\'a}n de Colima (M{\'e}xico) overview, including six numbered locations (stars). In (a), (b), (c), (d) are enlarged the proximal topographic features to those locations. Initial pile is marked by a blue dot. Reported coordinates are in UTM zone 13N. Elevation isolines at every $10 m$ are displayed in black, at every $100m$ in bold blue. Elevation values in red.}
        \label{fig:Colima-extra}
\end{figure}

In general is either observed an initial short-lasting bulge followed by a slow decrease lasting for several minutes and asymptotically tending to a positive height, or a steady increase of material height tending to a positive height. In both cases it is sometimes observed a bimodal profile in the first 5 minutes. Finally, plots \ref{fig:BAF-H-MC}k,l focus on three points set at about the runout distance of the flow, in the most important ravines, at $\sim 4-5$ km from the initial pile. %Profiles are similar to what observed in point $L_4$ of Fig.\ref{fig:Ramp-H}d.

\subsubsection{Flow Height in Six Locations}\label{Obs2}
We select six preferred locations, illustrative of a range of flow regimes. They are $[L_8, L_{10}, L_{17}, L_{39}, L_{43}, L_{46}]$, as displayed in Figure \ref{fig:Colima-extra}. The first two points, $L_8$ and $L_{10}$, are both proximal to the initiation pile. Points $L_{17}$ and $L_{43}$ are placed where the slope is reducing and the ravines are evident, and $L_{39}$ and $L_{46}$ are placed in the channels, further down-slope. In particular, $L_8$, $L_{43}$, and $L_{46}$ are at the western side of the inundated area, whereas $L_{10}$, $L_{17}$, and $L_{39}$ are at the eastern side.

Figure \ref{fig:Colima-H} shows the flow height, $h(L,t)$, at the points $(L_i)_{i=8,10,17,39,43,46}$. Distances from the initial pile are in vertical projection. In plots \ref{fig:Colima-H}a,b, we show the flow height in points $L_8$ and $L_{10}$, $\sim 200 m$ and $\sim 500 m$ from the initial pile, respectively. Models MC and PF display similar profiles, positive for less than $15 s$ and bell-shaped. VS requires a significantly longer time to decrease, particularly in point $L_{10}$, where the average flow height is still positive after $\sim 200 s$. Peak average values in $L_8$ are $3.4 m$ in $PF$, $4.3 m$ in MC, $4.7 m$ in VS. Uncertainty is about $\pm 2 m$, halved on the lower side in $MC$, and $PF$. In $L_{10}$, models MC and PF are very similar, with peak height at $1.4 m$ and uncertainty $\pm 0.5 m$. Model VS, in contrast, has a maximum height of $1.1 m$ lasting for $50s$, and 95$^{th}$ percentile reaching $3.7 m$.

In plots \ref{fig:Colima-H}c,e, we show the flow height in points $L_{17}$ and $L_{43}$, both at $\sim 2 km$ from the initial pile. All the models show a fast spike during the first minute, followed by a slow decrease. There is still material after $1800 s$. VS has a secondary rise peaking at $\sim 450 s$, which is not observed in the other models. This produces higher values for the most of the temporal duration, but similar deposit thickness after more than 1 hour. Maximum values are $1 m$ for MC, $2 m$ for PF, and $1.5 m$ for VS, in both locations. The 5$^{th}$ percentile is zero in all the models, meaning that the parameter range does not always allow the flow to reach these locations. The 95$^{th}$ percentile is above $5 m$, except for VS in point $L_{17}$. In plots \ref{fig:Colima-H}d,f, we show the flow height in points $L_{39}$ and $L_{46}$, both placed at more than $3 km$ from the initial pile. The three models all show a monotone profile except for  MC in point $L_{39}$, which instead displays an initial spike and a decrease before to rise again. A similar thing is observed in the 95$^{th}$ percentiles of all the models. It is significant that the 5$^{th}$ percentile of PF becomes positive after $\sim 5400 s$, meaning that almost surely the flow has reached that location. Deposit thickness in point $L_{39}$ is $\sim 0.5 m$ for all the models, whereas in point $L_{46}$ it is $1.7 m$ in VS, $1.6 m$ in PF, and $1.2 m$ in MC.

We note that VS is temporally stretched compared to the other models, and material arrives later and stays longer in all the sample points. This is a consequence of the speed dependent term reducing flow velocity.

\begin{figure}[H]
         \centering
        \includegraphics[width=0.87\textwidth]{Height.png}
        \caption{Flow height in six locations. Bold line is mean value, dashed/dotted lines are 5$^{\mathrm{th}}$ and 95$^{\mathrm{th}}$ percentile bounds. Different models are displayed with different colors. Plots are at different scale, for simplifying lecture.}
        \label{fig:Colima-H}
\end{figure}

\section{Analysis and Discussion}\label{Hq2}
\subsection{Contributing Variables Proximal to the Initial Pile}
Figure \ref{fig:Colima-Pr1} shows the dominance factors $(P_i)_{i=1,\dots,4}$ of the RHS terms modulus, in the three proximal points $L_{8}$, $L_{10}$, and $L_{17}$, all closer than 1 km to the initial pile. This Figure is modified from \cite{Patra2018b} and its description is reported in this study for the sake of completeness.

The plots \ref{fig:Colima-Pr1}a,b,c and \ref{fig:Colima-Pr1}d,e,f are related to point $L_8$ and $L_{10}$, respectively. They are significantly similar. The gravitational force $\boldsymbol{RHS_1}$ is the dominant variable with a very high chance, $P_1>90\%$. In MC and PF there is a small probability, $P_3=5\%-30\%$, of $\boldsymbol{RHS_3}$ being the dominant variable for $\sim 5 s$. In VS it is observed a $P_4=5\%$ chance of $\boldsymbol{RHS_4}$ being dominant, just for a few seconds. Plots \ref{fig:Colima-Pr1}g,h,i concern the relatively more distal point $L_{17}$. They are split in two sub-frames at different time scale. In all the models, $\boldsymbol{RHS_2}$ is the most probable dominant variable, and its dominance factor has a bell-shaped profile. In all the models, also $\boldsymbol{RHS_1}$ has a small chance of being the dominant variable. In MC this chance is more significant, at most $P_1=30\%$ for $\sim 20 s$, and again $P_1=2\%$ in $[100, 7200] s$. In PF $P_1=15\%$ in two peaks, one short lasting at about $55 s$, and the second extending in $[100,500] s$. Also in VS, $P_1=15\%$ at $[300, 500] s$. Its profile is unimodal in time and becomes $P_1<2\%$ after $2000 s$. In MC and PF, $\boldsymbol{RHS_3}$ has a dominance factor $P_3=10\%$ at $[30, 50] s$ and $[40, 50] s$, respectively.
\begin{figure}[H]
         \centering
        \includegraphics[width=0.90\textwidth]{Pr1_total.png}
        \caption{Dominance factors of the forces in three locations in the first km of runout. Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS. Different colors correspond to different force terms. No-flow probability is displayed with a green dashed line.}
        \label{fig:Colima-Pr1}
\end{figure}
In summary, gravitational force is dominant with a very high chance until the no-flow probability becomes large. In MC and PF curvature related forces can also be dominant for a short time. In VS gravitational force is dominant for a larger time span than in the other models, because of the longer presence of the flow. The speed dependent friction can be dominant with a small probability at the beginning of the dynamics.

Figure \ref{fig:Colima-Ci_1} shows the corresponding expected contributions $\mathbb E[C_i]_{i=1,\dots,4}$. $\forall i$, $C_i$ is related to the force term $\boldsymbol{RHS_i}$. The contributions in points $L_8$ and $L_{10}$ are shown in \ref{fig:Colima-Ci_1}a,b,c and \ref{fig:Colima-Ci_1}d,e,f, respectively. The plots related to the same model are similar. In all the models $C_1$ is significantly larger than $C_2$ and $C_3$, which are almost equivalent in MC and PF, while $C_2>C_3$ in VS. $C_4$ always gives a negligible contribution, except in VS, where it is comparable to $C_2$. In $L_8$, following PF, $C_3$ is bimodal, whereas it is unimodal in MC and VS. This is not observed in $L_{10}$. In $L_8$, $C_3$ is greater than in $L_{10}$, compared to the other forces. VS always shows a slower decrease of the plots. In plots \ref{fig:Colima-Ci_1}g,h,i are shown the expected contributions in $L_{17}$. The plots are split in two sub-frames at different time scale. Initial dynamics is dominated by $C_2$, except for in MC, and only for a short time, $[30, 40] s$. In MC there is an initial peak of $C_2$ which is not observed in the other models. $C_3$ has a significant size, in MC and PF, and unimodal profile. In PF, after $C_3$ wanes, at about $60 s$ also $C_4$ becomes not negligible for $\sim 40 s$. The second part of the temporal domain is characterized by a slow decrease of $C_2>C_1$.
\begin{figure}[H]
         \centering
        \includegraphics[width=0.95\textwidth]{Ci1_total.png}
        \caption{Expected contributions of the forces in three locations in the first km of runout. Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS. Different colors correspond to different force terms.}
        \label{fig:Colima-Ci_1}
\end{figure}
\subsection{Contributing Variables Distal from the Initial Pile}
Figure \ref{fig:Colima-Pr2} shows the dominance factors $(P_i)_{i=1,\dots,4}$ in the three distal points $L_{39}$, $L_{43}$, and $L_{46}$, all more than 2 km far from the initial pile. Plots \ref{fig:Colima-Pr2}a,b,c and $L_{39}$ are dominated by $\boldsymbol{RHS_1}$. In all the models $P_1$ is increasing and $P_1>90\%$ at the end of the simulation. In MC, $P_1$ shows a plateau at $\sim 40\%$ in $[90,2000] s$ preceded and followed by steep increases, while in the other models it rises gradually. $P_2>0$ after $\sim 500 s$ and $3600 s$, respectively, but is never greater than $2\%$. In MC $P_3 \approx 10\%$ at $[50,70] s$. No-flow probability becomes zero in PF and VS, while stops at $20\%$ in MC. Plots \ref{fig:Colima-Pr2}d,e,f are related to point $L_{43}$, and are remarkably complex. In MC, either $P_1$ and $P_2$ are $\sim 35\%$ in the first $200 s$. Then, $P_2$ increases, and $\boldsymbol{RHS_2}$ becomes the only dominant variable after $3600 s$. The no-flow probability is never below $30\%$. $P_3=35\%$ in $[40, 60] s$. Instead, in PF $P_1>90\%$ until $3600 s$, and $P_2$ rises only in the very last amount of time, reaching $P_2=P_1=40\%$. The no-flow probability is very low during the most of the temporal window, rising at $20\%$ only at $7200 s$. Both $P_3$ and $P_4$ show short peaks, $\sim 10\%$, at $[50,60] s$.

In VS the no-flow probability is never below $20\%$, and the dominance factors are broadly equivalent to MC, although $P_1$ is the greatest up to $4000 s$, and $P_3\equiv0$. Plots \ref{fig:Colima-Pr2}g,h,i are related to point $L_{46}$ and they are similar to those recorded at point $L_{17}$, but $P_2>90\%$ and the no-flow probability decreases to zero in the second half of the simulation. Moreover, in all the models $P_1$ does not show any initial peak and instead increases slowly, reaching $P_1=10\%$ after more than $3600 s$.

In summary, only the gravity or the basal friction are dominant with high probability. Some of the points have a deposit at the end with a high chance, some other not, depending on the slope. In general, in MC the no-flow probability tends to be larger than in the other models, because some flow samples stops earlier, or completely leaves the site. Again, curvature can have a small chance to be dominant in MC and PF, particularly when the speed is high. Point $L_{43}$ deserves a specific discussion. It is not proximal to the initiation, but the no-flow probability is increasing at the end, meaning that all the material tends to leave the site. Moreover, the dominating force can be the gravity or the basal friction depending on the time and the model. In MC and VS both the two forces have similar chances to be dominant for  most of the time of the simulation. In PF, only the gravitational force is dominant with a high chance. This is probably because point $L_{43}$ is situated downhill of a place where a significant amount of material stops.
\begin{figure}[H]
         \centering
        \includegraphics[width=0.90\textwidth]{Pr2_total.png}
        \caption{Dominance factors of the forces in three locations after 2 km of runout. Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS. Different colors correspond to different force terms. No-flow probability is displayed with a green dashed line.}
        \label{fig:Colima-Pr2}
\end{figure}

Figure \ref{fig:Colima-Ci_2} shows the expected contributions in the distal points. In general, it is worth noting that the remarkable diversity in the dominance factors between the different locations can be the consequence of even a small imbalance between gravity and basal friction. All the plots are dominated by $C_1$ and $C_2$, and the remarkable differences observed in the dominance factors depend on which contribution is the greatest. In general these two contributions have similar profiles. Plots \ref{fig:Colima-Ci_2}a,b,c are related to point $L_{39}$ and $C_1>C_2$. In MC, also is $C_3>0$ for a short time. In MC and PF also $C_4>0$, but it is significantly lower than the previous contributions, almost negligible in MC. Plots \ref{fig:Colima-Ci_2}d,e,f concern point $L_{43}$. In MC $C_1<C_2$, in PF $C_1>C_2$, in VS they $C_1$ decreases and crosses $C_2$ at $\sim 3600 s$. The two contributions form a plateau in MC, in $[90, 200] s$. In MC and PF $C_3>0$ for a few seconds, and also $C_4>0$ with an initial spike at $\sim 60s$. In particular, $C_4$ reaches $\sim 0.05$ at point $L_{43}$, and in Figure \ref{fig:Colima-CC1}e we showed that the force contribution can also be at 0.1 in that location, depending on the input values chosen. This means that RHS4 can be about 10\% of the dominant force. In PF it shows a long lasting plateau, while becomes negligible in MC. Plots \ref{fig:Colima-Ci_2}g,h,i are related to point $L_{46}$. In all the models $C_1<C_2$, and these force contributions are monotone increasing. Only in MC $C_3>0$ shortly, and $C_4>0$, but almost negligible.
\begin{figure}[H]
         \centering
        \includegraphics[width=0.95\textwidth]{Ci2_total.png}
        \caption{Expected contributions of the forces in three locations after 2 km of runout. Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS. Different colors correspond to different force terms.}
        \label{fig:Colima-Ci_2}
\end{figure}

\subsection{Flow Extent and Spatial Integrals}
Figure \ref{fig:Colima-spatial} shows the volumetric average of speed and Froude Number. It also shows the inundated area as a function of time. Spatial averages and inundated area have smoother plots than local measurements, and most of the details observed in local measurements are not easy to discern. In plot \ref{fig:Colima-spatial}a the speed shows a bell-shaped profile in all the models, but the maximum speed is $\sim 60 m/s$ in MC, $\sim 50 m/s$ in PF, $\sim 20 m/s$ in VS, on average. Uncertainty is $\pm 18 m/s$ in MC, similar, but skewed, in VS, and $\pm 10 m/s$ in PF.

In plot \ref{fig:Colima-spatial}b, the $Fr$ profile is very similar to the speed, but the difference between VS and the other models is accentuated. Maximum values are $\sim 50$ in MC, $\sim 38$ in PF, $\sim 5$ in VS, whereas uncertainty is $\pm 10$ in MC, $\pm 7$ in PF, and skewed $[-5, +10]$ in VS. In plot \ref{fig:Colima-spatial}c, inundated area has a first peak in MC and PF, both at $\sim 1.15 km^2$, followed by a decrease to $0.55 km^2$ and $0.7 km^2$, respectively, and then a slower increase up to a flat plateau at $0.9 km^2$ and $1.5 km^2$, respectively. Uncertainty is $\sim \pm 0.2 km^2$ in both MC and PF until $\sim 100 s$, and then it increases at $\pm 0.3 km^2$ and $[-0.5, +0.4] km^2$, respectively. In MC this increase in uncertainty is concentrated at $\sim 100 s$, while it is more gradual in PF. VS has a different profile. The initial peak is only significant in the 95$^{th}$ percentile values, and occurs later, at $\sim 100 s$. The peak is of $\sim 1 km^2$ on the average, but up to $\sim 1.8 km^2$ in the 95$^{th}$ percentile. The decrease after the peak is very slow and the average inundated area never goes below $0.85 km^2$, and eventually reaches back to $\sim 1 km^2$. Uncertainty is $[-0.3, +0.2] km^2$.
\begin{figure}[H]
        \centering
        \includegraphics[width=0.85\textwidth]{AveragedColima.png}
        \caption{Comparison between spatial averages of $(a)$ flow speed, and $(b)$ Froude Number, in addition to the $(c)$ inundated area, as a function of time. Bold line is mean value, dashed/dotted lines are 5$^{\mathrm{th}}$ and 95$^{\mathrm{th}}$ percentile bounds. Different models are displayed with different colors. Figure modified from \cite{Patra2018b}.}
        \label{fig:Colima-spatial}
\end{figure}

\subsection{Power Integrals}
Figure \ref{fig:Colima-Power-spatial} shows the spatial sum of the powers. The estimates in this section assume $\rho = 1800 kg/m^3$ as a constant scaling factor. Corresponding plots of the force terms are included in \cite{Patra2018}. The scalar product of force with velocity imposes the bell-shaped profile. In general, gravity term is larger in VS, because a portion of the flow lingers on the higher slopes for a long time. Basal friction has a higher peak in PF compared to the other models, due to the interpolation of the two basal friction angles.

In plot \ref{fig:Colima-Power-spatial}a the power of $\boldsymbol{RHS_1}$ starts from zero and rises up to $1.4\times 10^{11}\ W$ in MC, $1.2\times 10^{11}\ W$ in PF, $6.5\times 10^{10}\ W$ in VS. Uncertainty is $\pm 4.0\times 10^{10}\ W$ in MC, $\pm 3.0\times 10^{10}\ W$ in PF, $[-4.0,5.0]\times 10^{10}\ W$ in VS. The decrease of gravitational power is related to the slope reduction, and this decrease is more gradual in VS than in the other models. In plot \ref{fig:Colima-Power-spatial}b the power of  $\boldsymbol{RHS_2}$ is always negative and peaks to $-6.5\times 10^{10}\ W$ in MC, $-5.0\times 10^{10}\ W$ in PF, $-2.0\times 10^{10}\ W$ in VS. In VS this dissipative power is significantly more flat than in the other models. MC and PF show negligible powers after $\sim 100 s$, VS after $\sim 200 s$. Uncertainty is $\pm 2.0\times 10^{10}\ W$ in MC, $\pm 1.5\times 10^{10}\ W$ in PF, $[-2.0,1.0]\times 10^{10}\ W$ in VS. In PF, the plot starts from stronger values than in the other models, but it is also the faster to wane. In plot \ref{fig:Colima-Power-spatial}c the power of $\boldsymbol{RHS_3}$ shows a negative peak at $-7.0\times 10^{10}\ W$ in MC, $-4.5\times 10^{10}\ W$ in PF, $-0.5\times 10^{10}\ W$ in VS. Uncertainty on the peak value is $[-4.5,3.5]\times 10^{10}\ W$ in MC, $[-2.5,2.0]\times 10^{10}\ W$ in PF, $[-1.0,0.5]\times 10^{10}\ W$ in VS. The three models all show a bell-shaped profile, MC and PF waning to zero at $90 s$, VS at $\sim 30 s$. In plot \ref{fig:Colima-Power-spatial}d the power of $\boldsymbol{RHS_4}$ has a different meaning in the three models. In MC it is the internal friction term, and it only has almost negligible ripple visible in the first second. In PF it is a depth averaged pressure force linked to the thickness gradient, and has a very small effect limited to the first second of simulation, at $0.5\times 10^{10}\ W$. It becomes null at $\sim 10 s$. In VS, instead, it is a speed dependent term, and has a very relevant effect. The plot shows a bell-shaped profile, with a peak of $-3.5\times 10^{10}\ W$, $[-2.0,1.0]\times 10^{10}\ W$. After that, this dissipative power gradually decreases, and becomes negligible at $200 s$.
\begin{figure}[H]
        \centering
        \includegraphics[width=0.95\textwidth]{PowersColima.png}
        \caption{Spatial integrals of the powers. Bold line is mean value, dashed lines are 5$^{\mathrm{th}}$ and 95$^{\mathrm{th}}$ percentile bounds. Model comparison on the mean value is also displayed. Different models are displayed with different colors.}
        \label{fig:Colima-Power-spatial}
\end{figure}

\subsection{Example of Model Performance Calculation}
%Analysis of model suitability have been conducted.
Finally, we give an example of model performance evaluation of the couple $\left(M, \mathcal P_M\right)$ according to a specific observation. In past work \citep{Patra2005}, MC rheology was tuned to match deposits for known block and ash flows, but {\it a priori} predictive ability was limited by inability to tune without knowledge of flow character. The new procedure developed in this study enables an enhanced quantification of model performance, i.e. the similarity of the outputs and real data. We remark that the measured performance refers to the couple $\left(M, \mathcal P_M\right)$, and that different parameter ranges can produce different performances \citep{Tierz2016, Sandri2018}. This is in contrast with traditional performance analysis based on particular, albeit calibrated, simulations \citep{Charbonnier2012}.

Our example concerns the Volc{\'a}n de Colima case study, and in particular we compare the inundated region in our simulations to the deposit of a BAF occurred 16 April 1991 \citep{Saucedo2004, Rupp2004, Rupp2006}. In addition to modeling uncertainty, other uncertainty sources affect this evaluation. For example, our digital elevation map may be different from the exact topography encountered by the real flow, and the inundated region may have been different from the extent of the deposit found in the field. The inundated region is defined as the points in which the maximum flow height $H$ is greater than $10 cm$. A similar procedure may be applied to any observed variable produced by the models, if specific data become available.
\begin{figure}[H]
         \centering
        \includegraphics[width=0.85\textwidth]{Histograms.jpg}
        \caption{Volc{\'a}n de Colima. Pdf of the similarity index of inundated regions and a real BAF deposit. (a) is based on $\mathcal M_I$, (b) on $\mathcal M_U$, (c) on $\mathcal J$. The models are MC (red), PF (blue), VS (black). Data histograms are displayed in the background. Global $5^{th}$ and $95^{th}$ percentile values are indicated with dashed lines. Plots (d,e,f) display the average inundated region $\left\{\textbf{x} : E[H(\textbf{x})]>10 cm\right\}$. The boundary of the real deposit is marked with a colored line.}\label{fig:Colima-Hist}
\end{figure}

Let $\mathcal M:\mathcal C(\mathbb R^2)\rightarrow[0,1]$ be a similarity index defined on the compact subsets of the real plane, i.e. the closed and bounded sets. An equivalent definition can be based on the pseudo-metric $1-\mathcal M$. For example, we define
$$\mathcal M_1:=\frac{\int_{\mathbb R^2} 1_D(\textbf{x})d\textbf{x}}{\int_{\mathbb R^2} 1_{S \cup D}(\textbf{x}) d\textbf{x}},\quad \mathcal M_2:=\frac{\int_{\mathbb R^2} 1_{S \cap D}(\textbf{x}) d\textbf{x}}{\int_{\mathbb R^2} 1_D(\textbf{x})d\textbf{x}}, \quad \mathcal J:=\mathcal M_1\cdot \mathcal M_2,$$
where $S\subset \mathbb R^2$ is the inundated region, and $D\subset \mathbb R^2$ is the recorded deposit.

In particular, $\mathcal M_1$ is the area of the deposit over the area of the union of inundated region and deposit, $\mathcal M_2$ is the area of the intersection of inundated region and deposit over the area of the deposit, $\mathcal J$ is the product of the previous, also called Jaccard Index \citep{Jaccard1901}. Figure \ref{fig:Colima-Hist} shows the probability distribution of the similarity indices, according to the uniform probability $\mathcal P_M$ on the parameter ranges defined in this study. Different metrics can produce different performance estimates, for example MC inundates most of the deposit, but overestimates the inundated region, while VS relatively reduces the inundated region outside of the deposit boundary, but also leaves several not-inundated spots inside it.

Let $g:[a,b]\rightarrow[0,1]$ be a score function defined over the percentile range of the similarity index. The global $5^{th}$ and $95^{th}$ percentile values $[a,b]$ are defined assuming to select the model randomly with equal chance, and are also shown in Fig.\ref{fig:Colima-Hist}a,b,c.

Then the performance score $G_g$ of model $\left(M, \mathcal P_M\right)$ is defined as:
$$G_g\left(M, \mathcal P_M\right)=\int_{[a,b]} g(x) df_M(x),$$
where $f_M$ is the pdf related to the model. Possible score functions include a step function at the global median, a linear or quadratic function, a sigmoid function. Table \ref{Table1} shows alternative performance scores, according to changing similarity indices and score functions.

\begin{table}[H]
         \centering
        \includegraphics[width=1\textwidth]{SuppInfo_table.pdf}
        \caption{Performance scores as a function of model, performance metric and score function.}
        \label{Table1}
\end{table}

\section{Conclusions}
In this study, we have described a statistically driven method for investigating the constituents of complex models. We implemented three different models composed of different assumptions about rheology in geophysical mass flows to illustrate the approach. The data objectively shows the performance of the models over many possible flow regimes and topographies. The analysis of contributing variables illustrates the impact of several modeling assumptions. Understanding which assumptions dominate, and by how much, is a key step towards the construction of more efficient models for desired inputs. Such model composition is the subject of ongoing and future work, with the purpose of bypassing the search for a unique best model, and going beyond a simple mixture of alternative models.

In summary, our new method enabled us to break down the effects of the different physical assumptions in the dynamics, providing an improved understanding of what characterizes each model. The procedure was applied to the Digital Elevation Map (DEM) of the SW slope of Volc\'{a}n de Colima (MX). In particular, we presented:
\begin{itemize}
  \item A short review of the assumptions characterizing three commonly used rheologies of Mohr-Coulomb, Poliquenne-Forterre, Voellmy-Salm. This included a qualitative list of such assumptions, and a breaking down of the different terms in the differential equations.
  \item A new statistical framework, processing the mean and the uncertainty range of either observable or contributing variables in the simulations. The new concepts of dominance factors and expected contributions enabled a simplified description of the local dynamics. These quantities were analyzed at selected sites, and spatial integrals were calculated, which illustrate the characteristics of the entire flow.
  \item The contribution  coefficients $C_i$ and dominance factors $P_i$  introduced here allow us to quantify and compare in a probabilistic framework the effect of modeling assumptions based on the full range of flows explored using statistically rigorous ensemble computations.
  \item A final discussion, explaining all the observed features in the results in light of the known physical assumptions of the models, and the evolving flow regime in space and time. This included an example of the model performance estimation method, which depends on the metric and the cost function adopted.
\end{itemize}

Our analysis uncovered the following main features of the different geophysical models used in the example analysis:
\begin{itemize}
  \item Compared to the standard MC model, the lack of internal friction in the PF model produces an accentuated lateral spread. The spread is increased by the uninhibited internal pressure force, which briefly pushes the flow ahead and laterally during the initial collapse. That force can also have some minor effects in the accumulation of the final deposit. The interpolation between the smaller bed friction angle $\phi_1$ and the larger value $\phi_2$ in the PF model suddenly stops the flow if it is thin compared to its speed. This mechanism suppresses large peaks in flow speed.
  \item In VS, the speed-dependent friction has a great effect in reducing lateral spread and producing channelized flow even where there are otherwise minor ridges and adverse slopes in the topography. The flow tends to be significantly slower and more stretched out in the downslope direction. The effects of different formulations of the curvature term have less impact than do the effects of lower basal friction and speed.
  \item In our case study the internal friction term in the MC model has a relatively small impact on the dynamics of the flow. Neglecting the term will be a meaningful choice if workload reduction is required in future analyses.
\end{itemize}

Furthermore, we can make the following statements about the technique in terms of its use on models in general:
\begin{itemize}
\item It gives information not only on which forces in the equations of motion are dominating the flow, but also shows where these forces are greatest and gives insight into why they locally peak and vary, and into the impacts of the dominating forces on the model flow outputs,
\item It provides a new, quantitative technique to evaluate the most important forces or phenomena acting in a particular model domain, which can supplement, provide insight and guidance into, and generate quantitative information for, the more typical methods used in force analysis of intuition and Similarity Theory.
\end{itemize}

Additional research concerning other case studies, and different parameter ranges, might reveal other flow regimes, and hence differences in the consequences of the modeling assumptions under new circumstances.

\section*{\small Acknowledgements}
An extended version of this manuscript has been released at https://arxiv.org/abs/1805.12104 \citep{Patra2018} . In particular, \cite{Patra2018} includes a further analysis of the additional case study of a small scale flow on inclined plane and flat runway, as well as additional details on the latin hypercubes based on orthogonal arrays, and supplementary datasets including estimates of local Froude Numbers and spatial integrals of force terms. We would like to acknowledge the support of NSF awards 1521855, 1621853, and 1339765.

\bibliographystyle{apalike}
\bibliography{mybibfile}

\end{document}