-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory management: swap+pool #412
base: master
Are you sure you want to change the base?
Conversation
delay swap_plan() by 3 more iterations update train correct swap_sched(), swap_select(),swap_plan() correct load update in swap_select vec_run changed to new 3 iterations correct vec_run36 index issue correct overhead issue, verify vec_run.t vec_run duplicate to avoid sorting issue
verified itm 5 indices in Table_sched vec_swap_select pass by reference in swap_sched() impl swap_update_tables(), before DeploySwap(), both at Append() for time being, remove negative r_idx itms && git push origin vd1 handle last itr by impl sizeSqn and verification to change asyncSwapFlag back to 0
correct swap_construct_tables(), included negative r_idx for swap_update_tables() and DeploySwap() include negative r_idx for DeploySwap() impl GetRealGpuPtr() to swapIn nullptr Block at last iteration impl GetRealGpuPtr(), and optimize data() and mutable_data() impl GetRealGpuPtr(), and optimize data() and mutable_data() verify const issue change to return tempData instead of updating data_ without remove erasing in Table_not_at_device milestone of last itr, at 550 MB
new class of pool: SwapPool important APIs: PoolOpt(), Malloc(), Free() PoolOpt() takes in M/F sequences including those induced by swapping cross-iteration variables and last iteration case solved. record down MF after swap done, for one iteration
CMakeLists.txt
Outdated
@@ -89,7 +89,7 @@ SET(EXECUTABLE_OUTPUT_PATH ${PROJECT_BINARY_DIR}/bin) | |||
IF (USE_CUDA) | |||
include(ExternalProject) | |||
ExternalProject_Add(cnmem | |||
GIT_REPOSITORY "https://github.com/nusdbsystem/cnmem.git" | |||
GIT_REPOSITORY "https://github.com/junzhezhang/cnmem.git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you change cnmem source code?
@@ -31,24 +31,25 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls keep the cnn.py (instead of alexnet.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need to change the example model code.
include/singa/core/common.h
Outdated
Block(void* ptr, size_t size, size_t offset = 0) | ||
: data_(ptr), size_(size), offset_(offset) { | ||
Block(void* ptr, size_t size, size_t offset = 0, Device* ptrDevice = nullptr) | ||
: data_(ptr), size_(size), offset_(offset), ptrDevice_(ptrDevice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr_device_
include/singa/core/device.h
Outdated
///SwapGPU | ||
struct onePieceMsg{ | ||
/* | ||
members: [ptr, size, MallocFree, idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make the names consistent: MallocFree -> malloc_free
include/singa/core/device.h
Outdated
int MallocFree; | ||
int idx; | ||
double t; | ||
onePieceMsg(string p, size_t s, int M, int i):ptr(p),size(s),MallocFree(M),idx(i){} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const string &p
include/singa/core/device.h
Outdated
@@ -64,6 +65,11 @@ class Device { | |||
|
|||
/// Called by Tensor. | |||
void FreeBlock(Block* block); | |||
|
|||
void AppendInfo(string blockInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any comments on the blockInfo? better give an example.
include/singa/core/device.h
Outdated
@@ -102,6 +108,8 @@ class Device { | |||
|
|||
int id() const { return id_; } | |||
|
|||
virtual void* GetRealGpuPtr(const Block* block_) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by real gpu ptr?
///SwapGPU | ||
struct onePieceMsg{ | ||
/* | ||
members: [ptr, size, operation_type, idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr to CPU or GPU memory?
include/singa/core/device.h
Outdated
map<int,std::tuple<int,int,int,int>>Table_sched; // changed to with sync_r_idx | ||
|
||
//vec_block | ||
vector<string>vec_block; //iteration 0-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comments cannot explain the code.
include/singa/core/memory.h
Outdated
vector<string> vec; | ||
vector<string> vec_block_RW; | ||
vector<string> vec_block_RWMF; | ||
map<int,int>Table_r2d; //full duration info, cross-iteration duration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the names Table_r2d and Table_d2r are not descriptive..
src/core/common/common.cc
Outdated
ptr_device_->AppendInfo(temp); | ||
} | ||
if (data_ == nullptr) { | ||
cout<<"before GetRealGpuPtr, block_ and data_: "<<this<<' '<<data_<<endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cout??
It will dump too many prints on the screen..
src/core/device/swap_gpu.cc
Outdated
}; | ||
|
||
|
||
struct oneIterMsg{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kind of names is not descriptive.
src/core/device/swap_gpu.cc
Outdated
//vector of pairMsg is used in run. | ||
//vector of iterMsg is used in test. | ||
|
||
vector<onePieceMsg> swap_strVec_2_pieceMsgVec(vector<string> vec, int &idxRange){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't mix different naming styles..
src/core/device/swap_gpu.cc
Outdated
idxRange = static_cast<int>(onePieceMsgVec_.size()); | ||
|
||
return onePieceMsgVec_; | ||
}// end of strVec_2_pieceMsgVec function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is meaningless..
better add some comments for the functionality or the arguments.
Updated the PR as per required on 09 Nov meeting, focused on correctness and code readability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can you separate the Pool and Swap+Pool into 2 pull requests?
- please replace the strings with structs.
elif args.model == 'vgg': | ||
train_x, test_x = normalize_for_vgg(train_x, test_x) | ||
net = vgg.create_net(args.use_cpu) | ||
train((train_x, train_y, test_x, test_y), net, 250, vgg_lr, 0.0005, | ||
use_cpu=args.use_cpu) | ||
use_cpu=args.use_cpu,batch_size=args.batch_size) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, it would be better to keep the original example code.
@@ -24,7 +24,7 @@ | |||
#include <atomic> | |||
#include <memory> | |||
#include "singa/utils/logging.h" | |||
|
|||
#include <string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used?
include/singa/core/device.h
Outdated
@@ -64,6 +65,9 @@ class Device { | |||
|
|||
/// Called by Tensor. | |||
void FreeBlock(Block* block); | |||
|
|||
void AppendInfo(string block_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, please use a structure or class for block_info instead of string.
float mem_limit_ratio = 0.70; | ||
size_t smallest_block = 1<<20; //1 MB | ||
int data_buffer = 4; // used to control readyIdx | ||
int mutable_data_buffer = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does 4 and 6 mean?
src/core/common/common.cc
Outdated
//Append block info: opt_type, ptr, time_stamp | ||
if (ptr_device_!=nullptr){ | ||
//Append info. | ||
stringstream strm2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code would be cleaner if the strings are replaced with struct.
int name; | ||
size_t size; | ||
int r_idx; | ||
int d_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are r_idx, d_idx and name?
Replaced the strings with structs in Append function as per requested, updated in branch vd2. New PR could not be created to Apache master, so it was created to my forked repo. For the other request, separation of Pool and Swap+Pool into 2 PR is not possible from git, as they were updated mixed and match. But they are well separated in different classes of Device and Memory |
new class of pool: SwapPool
important APIs: PoolOpt(), Malloc(), Free()
PoolOpt() takes in M/F seq including those induced by swapping
cross-iteration variables and last iteration case solved.