-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement: Change Decimal to simple float #20
Comments
This looks interesting: https://github.com/jrmuizel/pyunum |
Ryan, thanks a lot for carrying out the speed profiling. Very useful! This is a actually quite a tricky issue and we'll need to tread carefully here. The motivation for using Decimal values was to ensure 100% accuracy of the pricing calculations. While this may seem rather onerous for the majority of backtesting purposes, it is an absolute necessity for regulatory/audit purposes in the institutional world. As an anecdote, in the original fund I worked for, after around 8-10 months, we had a difference of $0.12 in our Net Asset Value calculations when using floating point compared to Decimal. This may seem like small change, but for audit purposes it is essentially a "lost" 12 cents and had to be accounted for. I definitely don't want to penalise speed for 95% of QSTrader users just to ensure institutional compatibility, but I would love to find a way where we can offer both options, perhaps as a configuration setting. However, I don't see there being an easy way to do this without a lot of code branching or class duplication. Thoughts? |
@mhallsmoore I can imagine the pain! Completely agree, so I'm very wary of making any decisions on this. One method I've come across in C/C++ for a ticker-plant that seems crude is to multiply out the decimal points so we end up with Just have to define how much precision we want. Would just have to be careful to ensure that this is a blanket rule applied across the entire system; with the only exception being for printing values to screen. I think in the end a compromise will have to be made one way or the other -- catering to both will add a scary amount of complexity. It's not too difficult to change this to one thing or another, so I can't see any reason to do this "right now" -- best to be sure of it first. |
Interesting discussion! Good chance I'm naive here... But why not wrap the decimal function currently used in the code in a way that the wrapper checks a config setting / and if OK to use simple floats, returns a simple float instead of the decimal rounded result. I haven't spent much time inspecting the code for if this approach would work... But thought I would throw it out! Ryan |
After a little search it seems that gmplib (https://github.com/aleaxit/gmpy) or cDecimal (https://pypi.python.org/pypi/cdecimal/) might do the trick but it implies adding more dependencies to the project though it seems to be worth when seeing @ryankennedyio's results ... |
For what it's worth, here is some further discussion on how exchanges disseminate data, it seems integer or long is the way to go; FWIW I'm taking this approach when with Cassandra when storing my data now. I really believe the best result will be had by using 10,000 as a constant price multiplier, and working with values as fixed-point data types with accuracy to 1/100th of a cent. Whenever displaying numbers out to screen, just divide them through by the price multiplier. Thoughts @mhallsmoore ? |
I think 10,000 is the way to go (for the price multiplier) as this will easily allow forex positions when we eventually add them down the line. This will be quite a bit of work, but thankfully I have prior unit tests in place that will enforce the correct prices. I just need to modify them from Decimal to Integer. |
ZF price looks like: 121.2890625. 10,000 is not enough. |
10,000,000 ? Everything in Python is a long in 3.x, so I guess technically that's OK. Just an issue of readability at some point, though that would force the user to multiply in order to read it, which sort of enforces best practice. |
Yes. That should do it. |
A clean approach may be to use multiple dispatch https://en.wikipedia.org/wiki/Multiple_dispatch https://github.com/mrocklin/multipledispatch/ it's a key concept of Julia but I haven't use it with Python This code could also help: To parse string from decimal import Decimal
TWOPLACES = Decimal("0.01")
FIVEPLACES = Decimal("0.00001")
class Parser(object):
pass
class FloatParser(Parser):
def price(self, s):
return float(s)
def volume(self, s):
return float(s)
def amount(s):
return float(s)
def midpoint(self, a, b):
return (a + b) / 2.0
class DecimalParser(Parser):
def price(self, s):
return Decimal(s).quantize(FIVEPLACES)
def volume(self, s):
return Decimal(s)
def amount(self, s):
return Decimal(s).quantize(TWOPLACES)
def midpoint(self, a, b):
return (a + b) / Decimal("2.0")
class IntegerParser(Parser):
PRICE_MULT = 10**5
VOLUME_MULT = 10**2
AMOUNT_MULT = 10**2
def price(self, s):
return int(float(s) * self.PRICE_MULT)
def volume(self, s):
return int(float(s) * self.VOLUME_MULT)
def amount(self, s):
return int(float(s) * self.AMOUNT_MULT)
def midpoint(self, a, b):
return (a + b) // 2 # integer division To display price, volume, amount... class Display(object):
pass
class IntegerDisplay(Display):
PRICE_DIGITS = 5
PRICE_FORMAT = "%.5f"
PRICE_MULT = 10**PRICE_DIGITS
VOLUME_DIGITS = 2
VOLUME_FORMAT = "%.2f"
VOLUME_MULT = 10**VOLUME_DIGITS
AMOUNT_DIGITS = 2
AMOUNT_FORMAT = "%.2f"
AMOUNT_MULT = 10**AMOUNT_DIGITS
def price(self, x):
return self.PRICE_FORMAT % (x / self.PRICE_MULT)
def volume(self, x):
return self.VOLUME_FORMAT % (x / self.VOLUME_MULT)
def amount(self, x):
return self.AMOUNT_FORMAT % (x / self.AMOUNT_MULT) |
Okay I've spent 2 or 3 hours this evening revisiting the above, and fortunately the backtests run in about 90% less time. Honestly I think my first profiles were done with Statistics commented out -- so Statistics was definitely responsible for the bulk of that time. Sorry !!! That was honestly the very first thing I ever wrote in Python.... Anyway tl;dr
All profiling done on 2.5GHz processor, using the example MAC strategy on SP500TR ticker. At master At ryankennedyio@4ee3c07 At ryankennedyio@e67070c, At ryankennedyio@f4045ee At ryankennedyio@2b7c74e So, from ~15s down to less than 2 seconds. Loads better. @femtotrader that actually looks exactly like what I was thinking of. I didn't know the name, and those examples are great. Basically, as soon as I include multiple dispatch into my branch and rewrite most of the tests, I'm happy to PR it into master. Feedback on user-friendliness with other people's algorithms will be appreciated. |
This is really good - thanks Ryan and femto. Down from ~15s to 2s is a vast improvement, which will really pay dividends in parameter studies. I do like the multiple dispatch approach, it nicely separates out the calculation code. |
Nice job @ryankennedyio but I really think that an interesting metrics for profiling is number of ticks processed by second. see mhallsmoore/qsforex#18 @mhallsmoore if you like the multiple dispatch approach you should like Julia 😄 |
@ryankennedyio A possible improvement may also be to use Enum |
I did some speed measurements of qstrader with random data generated using I added def _speed(self, i):
return i / (time.time() - self.t0)
@property
def speed_iters(self):
return self._speed(self.iters)
@property
def speed_ticks(self):
return self._speed(self.ticks)
@property
def speed_bars(self):
return self._speed(self.bars)
def _s_speed(self, s, i):
return "%d %s processed @ %f %s/s" % (i, s, self._speed(i), s)
def s_speed_iters(self):
return self._s_speed("iters", self.iters)
def s_speed_ticks(self):
return self._s_speed("ticks", self.ticks)
def s_speed_bars(self):
return self._s_speed("bars", self.bars) to if event.type == 'TICK':
self.cur_time = event.time
if self.ticks % self.N == 0:
print("Tick %s, at %s" % (self.ticks, self.cur_time))
print(self.s_speed_ticks()) In current code, we are processing less than 150 ticks per second !!!! About printing ticks, bars, ... I think we should have a |
It's also very interesting to see how speed evolves over time
|
@femtotrader If you're testing that on the current master, it'll slow down over time due to the use of Pandas Series. Turns out Series is really bad at inserting data at indexes. If you add my repo as a Remote Upstream and merge I guess @mhallsmoore will have to draw some distinction on what he wants "out-of-the-box" for users right away, without being overly complex or daunting (with regard to strategies etc). Having slow example strategies really doesn't matter, as long as the "core" system will support faster ones written by the user. The thing I find most useful about this system is that it's designed in a really modular way -- I can just about plug and play anything I want into or out of it, while still being able to maintain dependancies on the rest of the codebase as it's updated. |
Some useful profiling tools: Running unit tests with nose-timer plugin $ nosetests -s -v --with-timer it's very easy to setup (just need to install it) other tools (maybe more complex to setup) vbench https://github.com/wesm/vbench Airspeed Velocity https://github.com/spacetelescope/asv PS: pandas dev are considering moving from vbench to asv |
Ok, aiming to have this wrapped up this weekend. Main thing left is multiple dispatch and fix the numerous merge conflicts to the new master. @mhallsmoore would be great if you could run through the open PR's and merge them into master when you get a tick, so this will slot right in when I'm done rather than doing another round of merge conflicts :) |
Phew ! Took longer than expected to get up to speed with these new changes in the codebase. Very nice @femtotrader . Sadly only got to squeeze an hour in this weekend. Bah. Not happy with how I'm using PRICE_MULTIPLIER and pulling |
How many BARS/s now or TICKS/s did you get with sample strategy ? |
Ideally I would prefer a solution with support of all these types:
|
Good point, supporting all types seems like it will be fine with multiple dispatch. Very clean. Buy and hold: |
Maybe this issue can be closed |
Speed has been frustrating me a little bit while backtesting, have had a hunch it was due to the Decimal calculations being used for accuracy's sake. A little research brought up this discussion from the quantopian repo.
I have made a small start here, tests are currently broken though.
https://github.com/ryankennedyio/qstrader/tree/decimal-to-float
Some preliminary testing on my macbook air (1.7GHz dual-core i7, 8GB RAM) show:
SP500 backtest at 7c88d26:
15.6sec
15.2sec
15.8sec
SP500 backtest at 2373c59 (from my branch):
2.88sec
3.03sec
2.90sec
Decimal usage seems to lead to a 5x longer backtest (rule of thumb). This might take even longer for strategies that have lots of price calculations.
I understand the motivation for using Decimal, so I will also update here with the value of the difference between the unit test results when using Decimal and float, to see how major the calculation differences end up being.
The text was updated successfully, but these errors were encountered: