The main purpose of this package is to provide a C++ “framework” that can be used to implement complex technical stock factors, e.g., “WorldQuant 101 Alphas” and “GTJA 191 Alphas”(the Chinese name of this research paper is “基于短周期价量特征的多因子选股体系”), in an efficient, maintainable and correct way.
This package currently implements all the 191 alphas that documented in GTJA’s research papers. We plan to make the package extensible in the future so that the users can implement their own definitions easily, by taking advantage of the C++ framework.
Most of these technical factors are generated by the machine (by data mining) so they are often nested with multiple layers. For example, the formula of the “alpha 87” factor in “GTJA 191 Alphas” looks like this:
Alpha87:
((RANK(DECAYLINEAR(DELTA(VWAP, 4), 7)) +
TSRANK(DECAYLINEAR(((((LOW * 0.9) +
(LOW * 0.1)) - VWAP) / (OPEN - ((HIGH + LOW) / 2))), 11), 7))
* -1)
Alpha160:
SMA((CLOSE<=DELAY(CLOSE,1)?STD(CLOSE,20):0),20,1)
As you can see, it’s complicated in ways that:
- It uses not only the historical price of the individual stock but also the peer info given any time point
- The formula are multiple-nested, the researcher is difficult to write the implementation code correctly
- Some functions can’t be expressed directly in codes so you may have to implement a formula with bloated code thus error-prone
- It’s difficult to know the historical length of the data requires
for a given formula, making the optimizing and
NA
handling issue harder
What’s more, the efficiency of implementation is very important: as the effectiveness of the technical factors declines quickly, we need to have factor values in daily frequency. At the time of writing, there’re more than 3000 stocks in the A-share market. Even if we are able to perform 100 calculation per second. It takes 10.5 hours to have a five-year historical factor value for a single factor (5 * 252 * 3000 / 3600 / 100).
However, given the complexity of the formula, it’s easy to use future information by accident (very dangerous in Quant research) or write incorrect codes, without a framework, while it’s also difficult to implement efficiently, with one.
This package strives to provide a framework so that you can write those alpha formulas in an efficient, maintainable and correct way. The two of those alphas can be implemented with C++ codes like below.
The first one looks still complicated but if you check the code carefully, you can see that the code is very similar / close to the original formula. In addition, it avoids the manual management of the data handling thus prevents you from using future data accidentally. What’s more important, it runs fast, due to taking advantage of the zero-cost abstraction that C++ empowers(it takes less than 1 minute to calculate the two alphas of all A-share stocks for the past five year 201501 - 201912, on a regular PC using three cores).
Alpha_mfun alpha087 = [](const Quotes& qts) -> Timeseries {
auto decay_linear1 = [](const Quote& qt) {
return decaylinear(
qt.ts<double>(7, [](const Quote& qt){ return delta(qt.ts_vwap(4)); })
);
};
auto decay_linear2 = [](const Quote& qt) {
auto part1 = qt.ts_low(11) * 0.9 + qt.ts_low(11) * 0.1 - qt.ts_vwap(11);
auto part2 = qt.ts_open(11) - (qt.ts_high(11) + qt.ts_low(11) / 2);
return decaylinear(part1 / part2);
};
auto ts_rank = [decay_linear2](const Quote& qt) {
return tsrank(qt.ts<double>(7, decay_linear2));
};
Timeseries part1 = rank(qts.apply(decay_linear1));
Timeseries part2 = qts.apply(ts_rank);
return (part1 + part2) * -1.0;
};
Alpha_fun alpha160 = [](const Quote& qt) -> double {
auto fun = [](const Quote& qt) {
return (qt.close() <= qt.close(1)) ? stdev(qt.ts_close(20)) : 0.0;
};
return sma(qt.ts<double>(20, fun), 1);
};
library(techfactor)
head(tf_quote)
#> DATE PCLOSE OPEN HIGH LOW CLOSE VWAP VOLUME AMOUNT
#> 1: 2018-01-02 31.06 31.45 32.99 31.45 32.56 32.46114 68343350 2218502767
#> 2: 2018-01-03 32.56 32.50 33.78 32.23 32.33 32.93164 64687020 2130249691
#> 3: 2018-01-04 32.33 32.76 33.53 32.10 33.12 32.89830 52908580 1740602533
#> 4: 2018-01-05 33.12 32.98 35.88 32.80 34.76 34.59591 84310196 2916787872
#> 5: 2018-01-08 34.76 35.11 36.96 35.11 35.99 36.04448 83078359 2994515872
#> 6: 2018-01-09 35.99 35.63 36.11 34.95 35.84 35.55054 47845909 1700947894
#> BMK_CLOSE BMK_OPEN
#> 1: 3405.275 3405.275
#> 2: 3429.864 3429.864
#> 3: 3442.373 3442.373
#> 4: 3446.696 3446.696
#> 5: 3459.510 3459.510
#> 6: 3470.250 3470.250
(from_to <- range(tail(tf_quote$DATE)))
#> [1] "2018-04-26" "2018-05-07"
factors <- tf_reg_factors()
str(factors)
#> chr [1:191] "alpha001" "alpha002" "alpha003" "alpha004" "alpha005" ...
#> - attr(*, "normal")= chr [1:128] "alpha002" "alpha003" "alpha004" "alpha005" ...
#> - attr(*, "panel")= chr [1:63] "alpha001" "alpha006" "alpha007" "alpha008" ...
(normal_factor <- attr(factors, "normal")[1])
#> [1] "alpha002"
(panel_factor <- attr(factors, "panel")[1])
#> [1] "alpha001"
qt <- tf_quote_xptr(tf_quote)
tf_qt_cal(qt, normal_factor, from_to)
#> alpha002
#> 2018-04-26 0.228474
#> 2018-04-27 -1.238390
#> 2018-05-02 1.376597
#> 2018-05-03 -1.302913
#> 2018-05-04 1.133333
#> 2018-05-07 -1.404219
head(tf_quotes[1])
#> $SZ300333
#> DATE PCLOSE OPEN HIGH LOW CLOSE VWAP VOLUME AMOUNT
#> 1: 2014-01-02 18.41 18.25 19.47 18.18 19.42 19.09579 4973297 94969018
#> 2: 2014-01-03 19.42 19.26 19.63 18.95 19.14 19.24656 4644767 89395800
#> 3: 2014-01-06 19.14 19.09 19.14 18.11 18.23 18.53877 3764967 69797853
#> 4: 2014-01-07 18.23 18.20 18.90 18.00 18.88 18.53837 3661866 67885019
#> 5: 2014-01-08 18.88 18.96 19.75 18.88 19.42 19.42419 5951451 115602106
#> ---
#> 1050: 2018-04-23 10.35 10.21 11.39 10.09 11.39 10.95390 75754103 829803230
#> 1051: 2018-04-24 11.39 10.99 12.53 10.86 12.53 11.71113 76179024 892142570
#> 1052: 2018-04-25 12.53 13.30 13.78 13.30 13.78 13.63979 88624083 1208814170
#> 1053: 2018-04-26 13.78 13.46 13.81 12.65 12.71 13.14234 87762946 1153410391
#> 1054: 2018-04-27 12.71 13.00 13.00 11.81 12.11 12.28674 69980225 859828858
#> BMK_CLOSE BMK_OPEN
#> 1: 1962.750 1962.750
#> 2: 1945.083 1945.083
#> 3: 1897.140 1897.140
#> 4: 1904.342 1904.342
#> 5: 1912.186 1912.186
#> ---
#> 1050: 3115.012 3115.012
#> 1051: 3181.041 3181.041
#> 1052: 3180.202 3180.202
#> 1053: 3120.147 3120.147
#> 1054: 3123.152 3123.152
qts <- tf_quotes_xptr(tf_quotes)
tf_qts_cal(qts, normal_factor, from_to)
#> SZ300333 SH601158 SZ002788 SH603101 SH600020 SH601668
#> 2018-04-26 1.8965517 0.08571429 1.564356 0.4707602 0.4444444 0.3563636
#> 2018-04-27 -0.4007534 -0.92500000 -1.666667 -1.0959596 -0.7777778 -0.3200000
#> SH600615 SZ002721 SZ300517 SH601567 SH603477 SZ002297
#> 2018-04-26 0.8758170 NA 0.5395764 -0.6153846 0.4538462 1.0396341
#> 2018-04-27 -0.3572985 NA 0.3940621 -0.2797203 -0.6760684 -0.6146341
#> SH600537 SH603906 SH603183 SZ002884 SZ300531 SZ002641
#> 2018-04-26 1.0476190 1.604159 1.0332307 0.4903226 1.494949 0.800000
#> 2018-04-27 -0.7142857 -1.049057 -0.1057424 -2.0000000 -1.122807 -1.666667
#> SZ002851 SH600719
#> 2018-04-26 0.4403752 -0.7593583
#> 2018-04-27 -0.5645370 0.7930283
tf_qts_cal(qts, panel_factor, from_to)
#> SZ300333 SH601158 SZ002788 SH603101 SH600020 SH601668
#> 2018-04-26 0.04664214 -0.6441926 -0.5135616 0.3013375 -0.9032789 -0.6186114
#> 2018-04-27 -0.28739263 -0.5270030 -0.4062324 0.6185499 -0.8719299 -0.5925445
#> SH600615 SZ002721 SZ300517 SH601567 SH603477 SZ002297
#> 2018-04-26 0.1917762 NA -0.6290566 -0.8998849 -0.06515103 -0.07402860
#> 2018-04-27 0.5320094 NA -0.7884800 -0.9069789 -0.05118596 -0.06976503
#> SH600537 SH603906 SH603183 SZ002884 SZ300531 SZ002641
#> 2018-04-26 -0.296330051 -0.8887960 -0.2486362 -0.5629602 -0.6881489 -0.4926093
#> 2018-04-27 -0.009231862 -0.3809784 -0.2365224 0.4326827 -0.4215067 0.1503632
#> SZ002851 SH600719
#> 2018-04-26 NA 0.10130851
#> 2018-04-27 -0.6780386 0.05142658
xfun::session_info(packages = 'techfactor')
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Catalina 10.15.3
#>
#> Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
#>
#> Package version:
#> anytime_0.3.7 BH_1.72.0.3 data.table_1.12.9 graphics_3.6.2
#> grDevices_3.6.2 grid_3.6.2 lattice_0.20.38 magrittr_1.5
#> methods_3.6.2 Rcpp_1.0.4.5 stats_3.6.2 techfactor_0.2.0
#> utils_3.6.2 xts_0.12.0 zoo_1.8.7