This project aims to analyze the Bitcoin OTC trust network using data mining techniques. The study compares the performance of classical linear regression methods, such as Alternating Least Squares (ALS), with advanced machine learning models, specifically deep learning methods. The dataset used is publicly available from the Stanford Network Analysis Project (SNAP).
The primary goal is to predict trust relationships within the Bitcoin OTC network, balancing accuracy, interpretability, and computational efficiency. This research provides insights into the dynamics of trust and security within decentralized trading environments.
Our approach involves two main methodologies:
- Alternating Least Squares (ALS, q.v. als.py)
- Deep Learning Models (Model R and Transformers, q.v. bitcoin_otc_torch.py)
The raw data was normalized to the range of -1 to 1 and split into training (70%), validation (10%), and test (20%) sets. Mean Squared Error (MSE) was used as the evaluation metric.
- ALS: MSE = 0.1250
- Model R: MSE = 0.1026
- Transformers: MSE = 0.0928
The results indicate that while deep learning models achieve higher accuracy, they are computationally intensive. ALS, on the other hand, is more computationally efficient and suitable for environments with limited resources.
ALS is a classical linear regression method used for matrix factorization in recommendation systems. It projects users and items into a k-dimensional space, approximating the ratings through inner product of latent feature vectors. For details on theoretical derivation, program implementation, and result analysis, please refer to the project report under the deliverables directory.
Two deep-learning models were employed:
- Model R
- Transformers
These models leverage neural networks to predict link weights with higher accuracy compared to ALS, though at the cost of increased computational requirements.
The study demonstrates that deep learning models outperform ALS in terms of prediction accuracy. However, ALS remains a viable option for scenarios with limited computational resources due to its efficiency.
- Changrong Li (李昶融): Project coordination, ALS work (literature research, theoretical derivation, development, deployment, result analysis, performance optimisation), report writing (full-text quality control, title, data processing, ALS chapters, appendix).
- Mānūśrī Tyāgī (मांनुश्री त्यागी): Initial proposal frame, ALS work with Changrong Li (development of 3 metrics), summary generation.
- Zhiqiang Yu (余志强): Theoretical research, literature review, validation of deep learning methods, verifying the accuracy and effectiveness of deep learning methods in collaboration with Ziyun, report writing (introduction, methods, machine learning section, discussion, conclusion).
- Ziyun Pan (潘梓韫): Application of deep learning methods, data preprocessing, and dataset splitting, report writing (introduction, methods, machine learning section, discussion, conclusion).
Future research will address data imbalances using techniques like SMOTE or GANs, enhance model scalability, and perform a cost-benefit analysis for practical applications.
- Jun 17 at 1:34pm
Good report! The results part could have been improved. It could have been nice to include some more evaluation metrics other than MSE to compare your methods + use confidence intervals (table 3) - Rémi Bourgerie (teaching assistant)
- Jun 17 at 1:34pm
Since it is visible from the "Individual Contribution" section that you were the main coordinator and were overseeing all parts of this project, we will give you maximum points for this effort. - Sarunas Girdzijauskas (lecturer, examiner)
This project is licensed under the GPLv3 License. See the LICENSE file for details.