Skip to content

francisco3511/stocksense

Repository files navigation

Stock Classifier and Analytics

CI Python 3.10+ License: MIT

This project implements an intelligent dynamic stock selection system using an Adaptive Genetic Algorithm-optimized XGBoost (GA-XGBoost) classifier to identify stocks with potential market outperformance. The model analyzes quarterly financial statements, market data, insider trading patterns and other external data to rank and select stocks that will outperform the S&P 500 index over a one-year horizon. The project includes a Streamlit-based analytics dashboard that provides comprehensive stock analysis tools, including technical indicators, financial metrics visualization, and model-driven insights.

Table of Contents

  1. Project Overview
  2. Features
  3. Installation
  4. Usage
  5. Contributing
  6. Acknowledgments & References
  7. License

Project Overview

This project implements an intelligent stock selection system that identifies potential market outperformers in the S&P 500 index. The system operates on a quarterly basis, aligning with earnings seasons, to maintain an adaptive investment strategy.

The core engine combines three key components:

  1. Data Pipeline

    • Automated collection of S&P 500 constituent data
    • Integration of multiple data sources:
      • Quarterly financial statements and earnings reports
      • Daily market data and technical indicators
      • Real-time insider trading patterns
      • Company-specific growth metrics
      • Macroeconomic indicators
  2. ML-Powered Stock Selection

    • Quarterly Investment Cycle:

      • Four trading dates per year aligned with earnings seasons
      • Automated data refresh and feature engineering on each date
      • Model retraining with expanding window of historical data
    • Stock Scoring Process:

      • GA-XGBoost classifier generates outperformance probabilities
      • Stocks ranked by probability of beating S&P 500 over next 12 months
      • Feature importance analysis for investment decision transparency
      • Portfolio rebalancing recommendations based on new scores
  3. Analytics Platform

    • Interactive Streamlit dashboard providing:
      • Market-wide sector analysis and trends
      • Individual stock deep-dives with technical indicators
      • Financial ratio comparisons and visualizations
      • Insider trading pattern analysis
      • Model-driven investment insights

The system maintains a SQLite database for efficient data management and provides a command-line interface for data updates, model training, and stock scoring operations. The model's predictions can be accessed through reports generated by the CLI (stocksense --score) and the Streamlit dashboard's "Stock Picks" page.

Features

  • Model Training

    • Adapative GA-XGBoost classifier with optimized hyperparameters
    • Feature engineering including growth ratios, financial metrics, price momentum, and volatility
    • Expanding window cross-validation and performance metrics
  • Streamlit App

    • Market overview dashboard
    • Individual stock analysis with technical indicators
    • Financial ratio visualization
    • Insider trading patterns
    • Model predictions and insights
  • Data Management

    • SQLite database with market, financial, and insider trading data
    • Automated data updates and validation
    • Historical S&P 500 constituent tracking

Installation

  1. Clone the repository:

    git clone https://github.com/your-user/stocksense.git
    cd stocksense
  2. Install dependencies using pyproject.toml:

    pip install .

Development Setup

  1. Install development dependencies:

    pip install -e ".[dev]"
  2. Install pre-commit hooks:

    chmod +x install-hooks.sh
    install-hooks.sh

Usage

Data Management

The project uses a trading date observation window, which sets 4 portfolio rebalancing dates per year. The last trading date is used for model training and stock scoring by default.

First, update the stock database:

stocksense --update

Model Training

Train the model for a given trade date:

stocksense --train --trade-date YYYY-MM-DD

Score stocks for a given trade date:

stocksense --score --trade-date YYYY-MM-DD

In order to evaluate for the last trading date, don't specify a trade date.

Streamlit App

To open the Streamlit app:

stocksense-app

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

Acknowledgments & References

This project's methodology was inspired by the following research:

Yang, H., Liu, X. Y., & Wu, Q. (2020). A Practical Machine Learning Approach for Dynamic Stock Recommendation. Columbia University. [paper]

Ye, Z. J., & Schuller, B. W. (2023). Capturing dynamics of post-earnings-announcement drift using a genetic algorithm-optimized XGBoost. Imperial College London. [paper]

Liu, X. Y., Yang, H., & Chen, Q. (2019). A Sustainable Quantitative Stock Selection Strategy Based on Dynamic Factor Adjustment. Columbia University. [paper]

License

This project is licensed under the MIT License.

About

Value stock selection model and analytics tool

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published