Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize statistics #214

Open
immone opened this issue Aug 24, 2024 · 0 comments
Open

optimize statistics #214

immone opened this issue Aug 24, 2024 · 0 comments

Comments

@immone
Copy link
Contributor

immone commented Aug 24, 2024

The statistic features in the application might end up having issues when the number of articles $n$ grows large, slowing down the article statistics views. Here are some possible things to consider to speed this up:

  • The domain and time series statistics are currently set to one API endpoint. Refactor this into 3 separate endpoints in server/src/routes.py and the corresponding DB functions in server/src/views/data_analysis/stats_analyzer.py.

  • Implement some level of cache to not fetch statistics if the articles table has not been updated.

  • The mapping from all news to individual word appearances is currently fully done in front end, and is $\mathcal O(n)$ with similar number of access times to the frequency data structure. It might be faster to use, e.g., Python's pandas.Series.value_counts in back end and then just send the json over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant