This project is made to calculate portfolio risk metrics Value-at-risk (VaR) and Conditional Value-at-risk (CVaR) using stochastic multivariate copula approach. The main idea of calculations is the following. Consider a multivariate financial time-series (stock prices for example) and transform them to log-returns. Then we do
- Fit marginal distributions. Implemented: normal (best perfomance), Levy stable (heavy-tailed), generalized hyperbolic (heavy-tailed).
- Fit joint distribution. We use multivariate Archimedian copla approach. Implemented Frank, Gumbel, Clayton and Joe copulas. For this copulas we consider a classical model with constant parameter (MLE) and time-dependend model where Ornstein-Uhlenbeck process is used as a copula parameter. This extension of model could improve joint distribution fit about 15-20% more for log-likelihood.
- Calculate risk metrics. WE calculate metrics VaR and CVaR using famous paper Rockafellar, Uryasev Optimization of conditional value-at-risk (2000). Authors proved two theorems that provides one to calculate metrics VaR and CVaR by minimazing special function and also build a CVaR optimized portfolio.
- Assess goodness-of-fit (GoF) metrics. For copulate GoF we use Rosenblatt transform approach and calculate corresponding p-value; for risk metrics -- Kupiec likelihood ratio test and critical statistics.
More information see below in Mathematical basis section.
This project is made during the MSc program "Financial mathematics and financial technologies" in Sirius University, Sochi, Russia. 2023-2024.
Installation could be made using pip:
git clone https://github.com/AANovokhatskiy/pyscarcopula
cd pyscarcopula
pip install .
We consider the following distributions for logarithmic returns:
Normal
The implementations is made manually and based on numba library. So this shows the best perfomance, but performs bad with heavy-tailed data.
Levy stable
with
Here we just use an existing scipy implementation. So see details in the corresponding page. Parallelization on all available cpu cores is made using joblib.
Generalized hyperbolic
Here we also use an existing scipy implementation. Parallelization on all available cpu cores is made using joblib.
According to Sklar's theorem joint distribution could be constructed using special function of marginal distributions is named copula.
Here we consider only a class of single-parameter copulas known as Archimedian. Denote
Here we use the following copulas
Copula | |||
---|---|---|---|
Gumbel | |||
Frank | |||
Joe | |||
Clayton |
Classical approach implies that
and
This approach for discrete case was proposed by Liesenfield and Richard, Univariate and Multivariate Stochastic Volatility Models: Estimation and Diagnostics (2003) and used in Hafner and Manner, Dynamic stochastic copula models: estimation, inference and applications (2012). Here we develop this approach in continious case.
Let
Rockafellar and Uryasev in Optimization of conditional value-at-risk (2000) proved that:
Here we solve this minimizations problems numerically using Monte-Carlo estimates of function
For copula GoF we use Rosenblatt transform. For given multivariate pseudo observations dataset
is uniformly disturbed (see details in Hering and Hofert, Goodness-of-fit Tests for Archimedean Copulas in High Dimensions (2015)). This fact could be checked using various methods. We use a scipy implementation of Cramer-Von-Mises test.
We could also check a statistical significance of found risk metrics estimates. A simple likelihood ratio test proposed by Kupiec (Techniques for verifying the accuracy of risk measurement models (1995)) is following. Consider statistics
where
import pandas as pd
import numpy as np
moex_data = pd.read_csv("data/moex_top.csv", index_col=0)
tickers = ['AFLT', 'LSRG', 'GAZP', 'NLMK']
returns_pd = np.log(moex_data[tickers] / moex_data[tickers].shift(1))[1:501]
returns = returns_pd.values
from pyscarcopula import GumbelCopula
copula = GumbelCopula(dim = 4)
Also available GumbelCopula
, JoeCopula
, ClaytonCopula
, FrankCopula
.
For initialized copula it is possible to show a cdf as a sympy expression:
copula.sp_cdf()
with output
It is also possible to call sp_pdf() to show pdf expression. But the anwser would be much more complex.
fit_result = copula.fit(data = returns, latent_process_tr = 500, m_iters = 5, accuracy=1e-4,
method = 'scar-m-ou', seed = 333)
fit_result
Function fit description:
- data -- initial dataset: log-returns or pseudo observations (see to_pobs description). Type Numpy Array. Required parameter.
- method -- calculation method. Type Literal. Available methods: mle, scar-p-ou, scar-m-ou, scar-p-ld. Required parameter.
- mle - classic method with constant parameter
- scar-p-ou - stochastic model with Ornstein-Uhlenbeck process as a parameter. latent_process_tr = 10000 is good choice here
- scar-m-ou - stochastic model with Ornstein-Uhlenbeck process as a parameter and implemented importance sampling Monte-Carlo techniques. latent_process_tr = 500 is good choice here
- scar-p-ld - stochastic model with process with logistic distribution transition density (experimental). latent_process_tr = 10000 is good choice here
-
accuracy -- calculation accuracy. Type float. Optional parameter. Default value:
$10^{-5}$ для mle,$10^{-4}$ for other methods. -
latent_process_tr -- number of latent process trajectories (for mle is ignored). Type int. Optional parameter. Default value
$500$ . -
m_iters -- number of importance sampling steps (only for scar-m-ou). Type int. Optional parameter. Default value
$5$ . - to_pobs -- transform data to pseudo observations. Type bool. Optional parameter. Default value True.
- dwt -- Wiener process values. Type Numpy Array. Optional parameter. Default value None. If None then function generate dataset automatically.
- seed -- random state. Type int. Optional parameter. Default value None. If None then every run of program would unique. Parameter is ignored if dwt is set explicitly.
- alpha0 -- starting point for optimization problem. Type Numpy Array. Optional parameter.
- init_state -- initial state of latent process. Type Numpy Array. Optional parameter.
fit result is mostly scipy.minimize output
message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
success: True
status: 0
fun: 184.55204171425615
x: [ 9.903e-01 6.082e-01 6.261e-01]
nit: 6
jac: [ 2.171e-01 3.244e-01 -9.893e-02]
nfev: 32
njev: 8
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
name: Gumbel copula
method: scar-m-ou
latent_process_tr: 500
m_iterations: 5
Where fun is log likelihood and x - parameter set
Goodness of fit for copula using Rosenblatt transform:
from pyscarcopula.stattests import gof_test
gof_test(copula, moex_returns, fit_result, to_pobs=True)
with output
CramerVonMisesResult(statistic=0.11370981269840286, pvalue=0.5219822616590395)
It is possible to get sample of pseudo observations from copula
copula.get_sample(N = 1000, r = 1.2)
The output is array of shape = (N, dim). Copula parameter r should be set manually (array of size N is also supported).
from pyscarcopula.metrics import risk_metrics
gamma = [0.95]
window_len = 250
latent_process_tr = 10000
MC_iterations = [int(10**5)]
M_iterations = 5
#fastest calculations
# method = 'mle'
# marginals_method = 'normal'
#More precise calculations. For appropiate choice of latent_process_tr and M_iterations see README.md
method = 'scar-p-ou' # or use 'scar-m-ou'
marginals_method = 'hyperbolic'
count_instruments = len(tickers)
portfolio_weight = np.ones(count_instruments) / count_instruments
result = risk_metrics(copula,
returns,
window_len,
gamma,
MC_iterations,
marginals_params_method = marginals_method,
latent_process_type = method,
latent_process_tr = latent_process_tr,
optimize_portfolio = False,
portfolio_weight = portfolio_weight,
seed = 111,
M_iterations = M_iterations,
save_logs = False
)
Function risk_metrics description
- copula -- object of class ArchimedianCopula (and inherited classes). Type ArchimedianCopula. Required parameter.
- data -- log-return dataset. Type Numpy Array. Required parameter.
- window_len -- window len. Type int. Required parameter. To use all available data use length of data.
-
gamma -- significance level. Type float or Numpy Array. Required parameter. If Numpy Array then calculations is made for every element of array. Made for minimizaion of repeated calculations. Usually one set, for example,
$0.95$ ,$0.97$ ,$0.99$ or array$[0.95, 0.97, 0.99]$ . - latent_process_type -- type of stochastic process that used as copula parameter. Type Literal. Available methods: mle, scar-p-ou, scar-m-ou, scar-p-ld. Required parameter.
- latent_process_tr -- number of Monte-Carlo iterations that used for copula fit. Type int. Required parameter.
- marginals_params_method -- method of marginal distribution fit. Type Literal. Available methods: normal, hyperbolic, stable. Required parameter.
-
MC_iterations -- number of Monte-Carlo iterations that used for risk metrics calculations. Type int or Numpy Array. Required parameter. If Numpy Array then calculations is made for every element of array. Possible value, for example,
$10^4$ ,$10^5$ ,$10^6$ and so on. -
optimize_portfolio -- parameter responsible for the need to search for optimal CVaR weights.Type Bool. Optional parameter. Default value
$True$ . - portfolio_weight -- portfolio weight. Type Numpy Array. Optional parameter. If optimize_portfolio = True this value is ignored. Default value -- equal weighted investment portfolio.
-
save_logs -- parameter responsible for the need to save logs of copula parameters search. If save_logs = True then function would create folder logs in the current directory and save copula parameters in csv file. Optional parameter. Default value
$False$ .
Calculated values could be extracted as follows
var = result[0.95][1000000]['var']
cvar = result[0.95][1000000]['cvar']
portfolio_weight = result[0.95][1000000]['weight']
Plot the CVaR metrics:
from pyscarcopula.metrics import cvar_emp_window
from matplotlib import pyplot as plt
import matplotlib.ticker as plticker
pd_var_95 = pd.Series(data = -result[gamma[0]][MC_iterations[0]]['var'], index=returns_pd.index).shift(1)
pd_cvar_95 = pd.Series(data = -result[gamma[0]][MC_iterations[0]]['cvar'], index=returns_pd.index).shift(1)
weight = result[gamma[0]][MC_iterations[0]]['weight']
n = 1
m = 1
i1 = 250
i2 = 499
fig,ax = plt.subplots(n,m,figsize=(10,6))
loc = plticker.MultipleLocator(base=27.0)
daily_returns = ((np.exp(returns_pd) - 1) * weight).sum(axis=1)
cvar_emp = cvar_emp_window(daily_returns.values, 1 - gamma[0], window_len)
ax.plot(np.clip(daily_returns, -0.2, 0.2)[i1:i2], label = 'Portfolio log return')
ax.plot(cvar_emp[i1:i2], label = 'Emperical CVaR', linestyle='dashed', color = 'gray')
ax.plot(pd_cvar_95[i1:i2], label= f'{method} {marginals_method} CVaR 95%')
ax.set_title(f'Daily returns', fontsize = 14)
ax.xaxis.set_major_locator(loc)
ax.set_xlabel('Date', fontsize = 12, loc = 'center')
ax.set_ylabel('Log return', fontsize = 12, loc = 'center')
ax.tick_params(axis='x', labelrotation = 15, labelsize = 12)
ax.tick_params(axis='y', labelsize = 12)
ax.grid(True)
ax.legend(fontsize=12, loc = 'upper right')
Goodness of fit for found CVaR using Kupiec test.
from pyscarcopula.stattests import Kupiec_POF
POF = Kupiec_POF(daily_returns.values[i1:i2], pd_cvar_95.values[i1:i2].flatten(), 1 - gamma[0])
and the output:
N = 249, Expected drawdowns = 12.45, Observed drawdowns = 7, x/N = 0.0281, p = 0.05
critical_value = 3.8415, estimated_statistics = 2.9633, accept = True
Examples of usage of this code coulde be found in example.ipynb notebook.