[Python + Google Cloud Platform (Google Cloud Storage, Bigquery, Cloud Function, Compute Engine)] VNStock Daily Data
Bài toán : Tôi muốn có dữ liệu về thị trường chứng khoán Việt Nam để có thể dựa vào số liệu đưa ra quyết định trong việc mua bán các mã cổ phiếu
Yêu cầu:
- Snapshot toàn bộ dữ liệu chứng khoán trong lịch sử tới thời điểm hiện tại và lưu vào trong GCS và ghi vào Bigquery.
- Dữ liệu daily sẽ được cập nhật vào mỗi 16h hằng ngày. Pipeline là lưu vào GCS & Import vào Bigquery. Tự động hoá pipeline.
Step 1 : Clone my project
git clone https://github.com/thangnh1/Vnstock-Data-GCP
Step 2 : Open in editor tool, run command in terminal
pip install -r requirements.txt
Step 3 : Run file get_data.py
python get_data.py
After running file get_data.py
, file data.csv
contains all stock data from past to present time
Step 4 : Load data to GCS and Bigquery
At Google Console, create new project, activate APIs Service : Bigquery, Cloud Storage, Compute Engine, Cloud Function.
Then search IAM & Admin
. In IAM & Admin NavMenu, choose Service Accounts
Click CREATE SERVICE ACCOUNT
and fill info
Create service account &
Fill Infomatiton account
Now, your service account is created!
Select the newly created account, switch to the Keys
> Add key
> Create new key
Choose JSON Type
and Create, a JSON file containing your credentials will be downloaded to the local server
Back to Editor, open push_data.py
, edit variable value, then run command python push_data.py
Check result in GCS and Bigquery in Google Console.
Step 5 : Create trigger with Cloud Function
Open Cloud Function, choose CREATE FUNCTION
Config function to run every time new data is added to GCS
Config memory, timeout and choose service account
Switch Runtime to Python 3.x
Entry point and Function need to have the same name
main.py
is the file containing the function code, copy the contents of the file cloud_function.py
from local and paste it into main.py
Add goole.cloud
and google.cloud.bigquery
to requirements.txt
Step 6 : Create VM with Compute Engine
Open Compute Engine, at VM instance choose CREATE INSTANCE
Config arbitrarily according to use needs, then click CREATE
After the virtual machine is created, select SSH at the Connect tab to open SSH-in-browser
Step 7 : Update data daily
In SSH-in-browser, run command
sudo timedatectl set-timezone Asia/Ho_Chi_Minh
Copy file private key service account, update_data.py
and requirements.txt
from local to VM with scp
or Upload File
in SSH-in-browser
Setup pip3 : sudo apt upgrade & sudo apt-get install python-pip3
Install libs : pip3 install -r requirements.txt
If the library installation has a version error, open the file requirements.txt
and delete the version number of the library
If the installation fails with the vnstock
library, open the file requirements.txt
and remove the line vnstock, then upload the file update_data_vnstock.py
to use instead of the file update_data.py
Create script : touch auto_run.sh & nano auto_run.sh
add the following line to the opened file python3 update_data.py
. Exit & save with Ctrl + X
-> y
-> Enter
Config Cronjob :
crontab -e
add the following line to the opened file 0 16 * * * bash /home/<user_name>/auto_run.sh
. Exit & save with Ctrl + X
-> y
-> Enter