-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement and numpy vectorization #61
Comments
string vs bytes, hex vs bin, I know this day will come 😃 I will think about this. Perhaps, this would be a major change for pyModeS 3.x in the future, since many dependency may break... @xoolive, any thought? |
Indeed, it would be great to clarify the use of data structures as it would be beneficial at the performance level as well. We tried some performance improvement with Cython in the speedup branch and saw that the choice of the most relevant structures to support was at stake. The lessons to take from this shot, imho, are:
|
There are a lot of projects using Cython (see uvloop or asyncpg for reference). I am not aware of any unsupported architectures. Personally, I am using it in a lot of my projects on x86_64 laptop and multiple ARM based Raspberry Pi (A, Zero, 3, 4). IMHO, I would stick just to Cython for certain functions and avoid code duplication. (however, if you decide to provide two implementations, please let me know. I might have some optimization for cprNL in common.py) |
For the moment, I merged only the Cythin code for the In the currently pyModeS version 2.x, even with the optional Cython
I think this might be the most logical way to move forward. Maybe using a decorator function to check the inputs? I am not sure how much slower this is going to be... |
On the compatibility and supported inputs. Looking at the API, I think first parameter of every function is a message. Therefore,
This should allow backward compatibility as well. |
Indeed, Also I believe it is always reasonable to keep a pure Python implementation as a fallback option. Even if it is always possible to compile the whole package for any architecture/platform/python version, a person in charge of an open source side project (i.e. a real day job and not so much time to devote to support) may feel iffy about opening the Pandora box and trying to provide a compiled version of the library for every possible combination. |
Even now, the API can support both hex strings and bytes. Please see #67. |
If possible, I would like to restart the discussion about this topic. Another argument in favour of binary message processing:
Over 3 times faster. I am approaching 1 billion messages in my database. This would be almost 12 minutes vs 3.3 minutes to extract downlink format information. |
@wrobell don't give up! I agree too... 🤣 |
This has to be added as a decorator to all current functions if we are going down this path. I am wondering if the unhexlify would slow the execution? |
My another comment:
is very hard to understand for people not familiar with bitwise operations. This also means we have to change almost every function in the decoder. That's a very large project 😨 |
Thanks to function overload we could always create optimized versions for different types of inputs - strings, bytes... and arrays of these, see below. Indeed, there is a lot of work, but you can always fall back on slower implementation until new version is provided. The real challenge is to design nice API. I was going through pyModeS code and realized that the most efficient implementation should use numpy anyway. pyModeS depends on the library, but does not utilize its potential, IMHO. I did some research, which lead me to implement this
The project includes a simplistic performance test script
The difference in processing of real life data is not as impressive, but still VModeS is 13 times faster (I will add appropriate scripts to above repo later). I will close this ticket now. I am planning to explore ideas for high performance processing via VModeS project. If, at any stage, you decide that improving performance of pyModeS is an option, please let me know and I will try to contribute back (IMHO #67 is a starting point). |
I will try to create some initial pull request in few days. I am still exploring few options around NumPy's API. |
When I receive ADS-B message from
dump1090
I am converting it to binary form withbinascii.unhexlify
. This is much more efficient to store than saving string version of messages.To decode the messages, I am reading them from a database in binary form, convert to string using
binascii.hexlify
and then usepyModeS
. But, internally,pyModeS
converts a message to binary form again (this or other way).It would be great if
pyModeS
API allowed to pass binary version of a message.Taking a quick look at the API, it seems that single dispatch with separate type registration for
str
andbytes
could do the trick?The text was updated successfully, but these errors were encountered: