Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement and numpy vectorization #61

Open
wrobell opened this issue Feb 27, 2020 · 14 comments
Open

Performance improvement and numpy vectorization #61

wrobell opened this issue Feb 27, 2020 · 14 comments

Comments

@wrobell
Copy link
Contributor

wrobell commented Feb 27, 2020

When I receive ADS-B message from dump1090 I am converting it to binary form with binascii.unhexlify. This is much more efficient to store than saving string version of messages.

To decode the messages, I am reading them from a database in binary form, convert to string using binascii.hexlify and then use pyModeS. But, internally, pyModeS converts a message to binary form again (this or other way).

It would be great if pyModeS API allowed to pass binary version of a message.

Taking a quick look at the API, it seems that single dispatch with separate type registration for str and bytes could do the trick?

@junzis
Copy link
Owner

junzis commented Mar 3, 2020

string vs bytes, hex vs bin, I know this day will come 😃

I will think about this. Perhaps, this would be a major change for pyModeS 3.x in the future, since many dependency may break...

@xoolive, any thought?

@xoolive
Copy link
Collaborator

xoolive commented Mar 3, 2020

Indeed, it would be great to clarify the use of data structures as it would be beneficial at the performance level as well.

We tried some performance improvement with Cython in the speedup branch and saw that the choice of the most relevant structures to support was at stake.

The lessons to take from this shot, imho, are:

  • choose one structure for the core decoding (in current state, it can be confusing at some point...), and stick to it (maybe rather bytearray than bytes, but I am not sure I remember well);
  • provide a compiled version (Cython?) of the library on PyPI (have travis prepare and upload them when a new tag is added?), with a fallback to the historical pure Python for unsupported architectures;
  • provide a vanilla Python API to the core decoding functions supporting bytes, str, hex, binascii, just anything... no questions asked.

@wrobell
Copy link
Contributor Author

wrobell commented Mar 3, 2020

There are a lot of projects using Cython (see uvloop or asyncpg for reference). I am not aware of any unsupported architectures. Personally, I am using it in a lot of my projects on x86_64 laptop and multiple ARM based Raspberry Pi (A, Zero, 3, 4). IMHO, I would stick just to Cython for certain functions and avoid code duplication.

(however, if you decide to provide two implementations, please let me know. I might have some optimization for cprNL in common.py)

@junzis
Copy link
Owner

junzis commented Mar 3, 2020

For the moment, I merged only the Cythin code for the common package (from the speedup branch). Those are the functions that are called all the time. It is still more than twice faster than pure python with the benchmark.

In the currently pyModeS version 2.x, even with the optional Cython c_common module, all the inputs are still strings.

provide a vanilla Python API to the core decoding functions supporting bytes, str, hex, binascii, just anything... no questions asked.

I think this might be the most logical way to move forward. Maybe using a decorator function to check the inputs? I am not sure how much slower this is going to be...

@wrobell
Copy link
Contributor Author

wrobell commented Mar 3, 2020

On the compatibility and supported inputs. Looking at the API, I think first parameter of every function is a message.

Therefore, singledispatch could be used to overload first parameter, i.e.

In [1]: from binascii import unhexlify                                                                                                                                                                                                         

In [2]: from functools import singledispatch                                                                                                                                                                                                   

In [3]: @singledispatch 
      : def decode(msg): 
      :     raise ValueError('Message type {} not supported'.format(type(msg))) 
      :                                                                                                                                                                                                                                        

In [6]: @decode.register(bytes) 
      : @decode.register(bytearray) 
      : def _decode_bin(msg): 
      :     print('decoding type {}'.format(type(msg))) 
      :                                                                                                                                                                                                                                        

In [7]: @decode.register(str) 
      : def _decode_str(msg): 
      :     decode(unhexlify(msg)) 
      :                                                                                                                                                                                                                                        

In [8]: decode(b'abcd')                                                                                                                                                                                                                        
decoding type <class 'bytes'>

In [9]: decode('abcd')   # will call _decode_str, then _decode_bin                                                                                                                                                                                                                        
decoding type <class 'bytes'>

In [10]: decode(bytearray(b'abcd'))                                                                                                                                                                                                            
decoding type <class 'bytearray'>

This should allow backward compatibility as well.

@xoolive
Copy link
Collaborator

xoolive commented Mar 3, 2020

Indeed, singledispatch would be a reasonable way to address this though I guess the biggest chunk of the work would be to clarify types in the core decoding functions.

Also I believe it is always reasonable to keep a pure Python implementation as a fallback option. Even if it is always possible to compile the whole package for any architecture/platform/python version, a person in charge of an open source side project (i.e. a real day job and not so much time to devote to support) may feel iffy about opening the Pandora box and trying to provide a compiled version of the library for every possible combination.

@wrobell
Copy link
Contributor Author

wrobell commented Mar 10, 2020

Even now, the API can support both hex strings and bytes. Please see #67.

@wrobell
Copy link
Contributor Author

wrobell commented Dec 4, 2020

If possible, I would like to restart the discussion about this topic.

Another argument in favour of binary message processing:

# ensure the optimized version of `df` function is imported
from pyModeS.c_common import df as pymodes_df
from binascii import unhexlify

def df_bin(msg):
    return min(24, (msg[0] & 0xf8) >> 3)

# string and binary version of the same message
msg = '904ca3a33219741c85465ae1b4c3'
msg_b = unhexlify(msg)

%timeit pymodes_df(msg)
718 ns ± 5.98 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each

%timeit df_bin(msg_b)
202 ns ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Over 3 times faster.

I am approaching 1 billion messages in my database. This would be almost 12 minutes vs 3.3 minutes to extract downlink format information.

@xoolive
Copy link
Collaborator

xoolive commented Dec 4, 2020

@wrobell don't give up! I agree too... 🤣

@junzis
Copy link
Owner

junzis commented Dec 7, 2020

This has to be added as a decorator to all current functions if we are going down this path.

I am wondering if the unhexlify would slow the execution?

@junzis
Copy link
Owner

junzis commented Dec 7, 2020

My another comment:

min(24, (msg[0] & 0xf8) >> 3)

is very hard to understand for people not familiar with bitwise operations.

This also means we have to change almost every function in the decoder. That's a very large project 😨

@wrobell
Copy link
Contributor Author

wrobell commented Dec 13, 2020

Thanks to function overload we could always create optimized versions for different types of inputs - strings, bytes... and arrays of these, see below. Indeed, there is a lot of work, but you can always fall back on slower implementation until new version is provided. The real challenge is to design nice API.

I was going through pyModeS code and realized that the most efficient implementation should use numpy anyway. pyModeS depends on the library, but does not utilize its potential, IMHO.

I did some research, which lead me to implement this

https://gitlab.com/wrobell/vmodes/-/blob/master/vmodes/parser.py

The project includes a simplistic performance test script

$ PYTHONPATH=. ./scripts/perf-test
processing 6.0 million messages

vmodes: df: 0.02 sec, 2.66 ns per message
vmodes: typecode: 0.04 sec, 7.03 ns per message
vmodes: icao: 0.22 sec, 36.71 ns per message
vmodes: category: 0.04 sec, 5.93 ns per message
vmodes: callsign: 2.20 sec, 365.95 ns per message
vmodes total: 2.51 sec, 418.33 ns per message

pyModeS: df: 4.52 sec, 752.87 ns per message
pyModeS: typecode: 7.83 sec, 1305.10 ns per message
pyModeS: icao: 11.42 sec, 1903.81 ns per message
pyModeS: category: 8.81 sec, 1468.20 ns per message
vmodes: callsign: 15.47 sec, 2578.46 ns per message
pyModeS total: 48.05 sec, 8008.47 ns per message

The difference in processing of real life data is not as impressive, but still VModeS is 13 times faster (I will add appropriate scripts to above repo later).

I will close this ticket now. I am planning to explore ideas for high performance processing via VModeS project. If, at any stage, you decide that improving performance of pyModeS is an option, please let me know and I will try to contribute back (IMHO #67 is a starting point).

@wrobell wrobell closed this as completed Dec 13, 2020
@junzis
Copy link
Owner

junzis commented Dec 14, 2020

@wrobell, the performance looks quite promising.
Would you be interested to work on a new pyModeS branch directly?
This way, @xoolive and I can also help with integrating it with the rest of the code.

@junzis junzis reopened this Dec 14, 2020
@junzis junzis changed the title extend api to support binary version of ADS-B messages Performance improvement and numpy vectorization Dec 14, 2020
@wrobell
Copy link
Contributor Author

wrobell commented Dec 16, 2020

I will try to create some initial pull request in few days.

I am still exploring few options around NumPy's API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants