Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[solidago] Implement get_individual_scores & get_collective_scores #1994

Merged
merged 25 commits into from
Oct 31, 2024

Conversation

NatNgs
Copy link
Collaborator

@NatNgs NatNgs commented Jul 4, 2024

Description

Implemented Solidago pipeline input get_individual_scores and get_collective_scores functions

  • Added user_id and entity_id to dataframe, as it was done for users and comparisons dataframes
  • Added Unit tests in test_pipeline.py (seems to be the good place as per the name)
  • Initially based on master, rebased to branch neurips24 as per requested

Checklist

  • I added the related issue(s) id in the related issues section (if any)
    • if not, delete the related issues section
  • I described my changes and my decisions in the PR description
  • I read the development guidelines of the CONTRIBUTING.md
  • The tests pass and have been updated if relevant
  • The code quality check pass

lenhoanglnh and others added 19 commits May 12, 2024 17:25
Fixed tiny_tournesol.zip file for testing.
Added data_analysis for dataset submission.
WIP Runtime error on icml24 experiments to be fixed
…than additional term.

This implies that the addition of a new user with huge uncertainties will not affect the quantile much.
… of neg. log likelihood by 1 (#1973)

---------

Co-authored-by: Louis Faucon <lpfaucon@gmail.com>
[solidago] Update docstrings and add simple API for `Pipeline`
@NatNgs NatNgs added python Pull requests that update Python code Solidago Tournesol algorithms library labels Jul 4, 2024
@NatNgs
Copy link
Collaborator Author

NatNgs commented Oct 10, 2024

Things to improve I just found:

  • Some columns are formatted as float but can be formatted as int (collective_scores.users, collective_scores.comparisons)
  • Certainly improvements to be made to get to the same result with more clever pandas functions I don't know

Base automatically changed from neurips24 to main October 24, 2024 09:41
@amatissart
Copy link
Member

Some columns are formatted as float but can be formatted as int (collective_scores.users, collective_scores.comparisons)

@NatNgs This is actually due do how pandas converts integers to floats when there are missing values (NaN) in the columns created by the "left join". Specifically some entities that are present in the collective scores may not appear in "comparisons.csv" (this is the case when the video has only been compared privately).

I tried to make that more explicit in another branch. See PR #2022 targeting this branch and the comments in the changes. Let me know if you have other pending work on this branch.

@amatissart amatissart changed the title [solidago] Implmented get_individual_scores & get_collective_scores [solidago] Implement get_individual_scores & get_collective_scores Oct 31, 2024
@amatissart amatissart merged commit 714c581 into main Oct 31, 2024
8 checks passed
@amatissart amatissart deleted the solidago-pipeline-users-scores branch October 31, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code Solidago Tournesol algorithms library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants