Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expected pass model #242

Open
Alek050 opened this issue Aug 19, 2024 · 5 comments
Open

Add expected pass model #242

Alek050 opened this issue Aug 19, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@Alek050
Copy link
Owner

Alek050 commented Aug 19, 2024

A physical model that predicts the likelyhood of a successfull pass given the locations and velocities of all players, the initial ball velocity, and the ball moving angle.

@Alek050 Alek050 added the enhancement New feature or request label Aug 19, 2024
@jonas-bischofberger
Copy link

I have started working on the implementation of the model and am currently encountering two major pain points:

  1. Normalized coordinates (wrt attacking direction) are not by default included in the processed data even though they must be used somewhere such as in xG and xT - is there a standardized way to obtain them that I'm missing? At the moment, I would do databallpy.features.add_team_possession to get the possession info and use that to calculate the normalized coordinates myself.
  2. The tabular tracking data format I obtain via match.tracking_data does not make sense to me - currently a row corresponds to an entire frame of tracking data rather than a object-position pair. But this means that I don't have and can't add any meta data about the players (e.g. to identify which team a player belongs to) and also can't join player identities with the event data (e.g. to exclude the passer from potential receivers). Is there a built-in way to get a different table format and to get the missing mapping information between tracking and event data?

@Alek050
Copy link
Owner Author

Alek050 commented Sep 23, 2024

Hi @jonas-bischofberger, thanks for your message and great to see that you started!

  1. right now there is not a build in way to normalize coordiantes wrt attacking direction. The playing direction for home team is always from left to right, and for the away team from right to left. I will open an issue to create a build in way to get normalized coordinates wrt attacking direction.

For now, there are two scenarios: if you need only the tracking and event data at the moment of the pass, use the team_id column in the event data to find out whether it is the match.home_team_id or the match.away_team_id. If it is the away team id, you have multiply all _x, _vx (and _ax) columns by -1 in the tracking data, and the start_x, start_y (and end_x, end_y) in the event data. If you need it normalized for all frames, not only the ones where events happen, the approach you use right now is the only solution.

  1. This was a design choice at the beginning of the package. All the metadata about the players can be found in match.home_players and match.away_players. You can use match.player_id_to_column_id() to match player ids to the column id in the tracking data (which is f"{team_side}_{jersey_number}). Also check out the match.home_players_column_ids() or match.away_players_column_ids() to get a list of column ids for an entire team.

Lastly, check out the match.passes_df or the match.pass_events for more info. For instance, match.pass_events is a dict with PassEvents with attributes like team_side, start_x. The PassEvents should work generally, but is still in beta use so some bugs might be in there. On top of that, I have limited access to metrica data so there might be some weir edge cases.

If you have any ideas/updates on how to make the package more intuitive and easier to use, please let me know so I can make some changes to the package and make it easier for anyone to use.

@Alek050
Copy link
Owner Author

Alek050 commented Oct 16, 2024

@jonas-bischofberger

For point 2, would you prefer to have a double indexed (frame number, player) pd.DataFrame? Or what would you propose? Adding different types of representations of the data could be considered for future versions.

@jonas-bischofberger
Copy link

For the time being, I'm using this function to do the conversion. I didn't test it thoroughly but it seems to do the job for my current purposes!

def per_object_frameify_tracking_data(
    df_tracking, frame_col, x_cols, y_cols, vx_cols, vy_cols, players, player_to_team, new_x_col="x", new_y_col="y",
    new_vx_col="vx", new_vy_col="vy", new_player_col="player_id", new_team_col="team_id", v_cols=None, new_v_col="v",
):
    """ Converts tracking data with '1 row per frame' into '1 row per frame + player' format """
    dfs_player = []
    for player_nr, player in enumerate(players):
        coordinate_cols = [x_cols[player_nr], y_cols[player_nr], vx_cols[player_nr], vy_cols[player_nr]]
        coordinate_mapping = {x_cols[player_nr]: new_x_col, y_cols[player_nr]: new_y_col, vx_cols[player_nr]: new_vx_col, vy_cols[player_nr]: new_vy_col}
        if v_cols is not None:
            coordinate_cols.append(v_cols[player_nr])
            coordinate_mapping[v_cols[player_nr]] = new_v_col
        df_player = df_tracking[[frame_col] + coordinate_cols]
        df_player = df_player.rename(columns=coordinate_mapping)
        df_player[new_player_col] = player
        df_player[new_team_col] = player_to_team.get(player, None)
        dfs_player.append(df_player)

    df_player = pd.concat(dfs_player, axis=0)

    all_coordinate_columns = x_cols + y_cols + vx_cols + vy_cols
    if v_cols is not None:
        all_coordinate_columns += v_cols

    remaining_cols = [col for col in df_tracking.columns if col not in [frame_col] + all_coordinate_columns]

    return df_player.merge(df_tracking[[frame_col] + remaining_cols], on=frame_col, how="left")

I'm also basically done with the implementation of the model, but I decided that I would like to bundle the core functionality of my model into its own library so that it can be used as independently as possible. I currently have what I believe is a relatively efficient backend implementation and an interface that comes with (a) a function that adds a column "xC" (expected completion) to a dataframe containing passes given a supplementary dataframe containing tracking data and (b) a function that adds a column "AS" and a column "DAS" adding (dangerous) accessible space to tracking data.

I would need a little bit of guidance how to proceed from here - I can offer to implement the functionality of (a) and (b) adding the respective columns to databallpy dataframes using my own to-be-published library. Also I could write a dashboard or notebook that walks through + illustrates the computational steps involved in the model by an example, without implementing the vectorized version in databallpy itself. Let me know what you think, @Alek050!

@Alek050
Copy link
Owner Author

Alek050 commented Oct 25, 2024

Hi @jonas-bischofberger,

Great to hear you are making progress and are planning on implementing the model in its own package! I am happy to share that the new version of DataBallPy will have a functing to normalize tracking data relative to attacking direction (see develop docs).

Depending a little on how well you are planning to maintain your own package, I think it would be most logical to do as you propose: add a small translator in databallpy between databallpy format to what your package expects as input, and get back the xC, AS, and DAS (depending on what is being asked by the user).

With that, I would still really love to have a notebook explaining how you get to your results in code, text and visualizations. For the precise vectorized implementation we will just refer to your package and paper so its clear where the nerds can find the real code.

I think that the implementation in DataBallPy has to wait untill you released your package, but if you want you can get started on the docs for some explenations in the calculations etc. (should be added in docs/features/{name your passing model}.ipynb)

If you have any questions you know where to find me (we can also plan in a meeting to discuss this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants