Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory management: swap+pool #412

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

junzhezhang
Copy link

new class of pool: SwapPool
important APIs: PoolOpt(), Malloc(), Free()
PoolOpt() takes in M/F seq including those induced by swapping
cross-iteration variables and last iteration case solved.

delay swap_plan() by 3 more iterations

update train

correct swap_sched(), swap_select(),swap_plan()

correct load update in swap_select

vec_run changed to new 3 iterations

correct vec_run36 index issue

correct overhead issue, verify vec_run.t

vec_run duplicate to avoid sorting issue
verified itm 5 indices in Table_sched

vec_swap_select pass by reference in swap_sched()

impl swap_update_tables(), before DeploySwap(), both at Append()

for time being, remove negative r_idx itms && git push origin vd1

handle last itr by impl sizeSqn and verification to change asyncSwapFlag back to 0
correct swap_construct_tables(), included negative r_idx for swap_update_tables() and DeploySwap()

include negative r_idx for DeploySwap()

impl GetRealGpuPtr() to swapIn nullptr Block at last iteration

impl GetRealGpuPtr(), and optimize data() and mutable_data()

impl GetRealGpuPtr(), and optimize data() and mutable_data()

verify const issue

change to return tempData instead of updating data_

without remove erasing in Table_not_at_device

milestone of last itr, at 550 MB
new class of pool: SwapPool
important APIs: PoolOpt(), Malloc(), Free()
PoolOpt() takes in M/F sequences including those induced by swapping
cross-iteration variables and last iteration case solved.

record down MF after swap done, for one iteration
CMakeLists.txt Outdated
@@ -89,7 +89,7 @@ SET(EXECUTABLE_OUTPUT_PATH ${PROJECT_BINARY_DIR}/bin)
IF (USE_CUDA)
include(ExternalProject)
ExternalProject_Add(cnmem
GIT_REPOSITORY "https://github.com/nusdbsystem/cnmem.git"
GIT_REPOSITORY "https://github.com/junzhezhang/cnmem.git"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you change cnmem source code?

@@ -31,24 +31,25 @@
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls keep the cnn.py (instead of alexnet.py)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to change the example model code.

Block(void* ptr, size_t size, size_t offset = 0)
: data_(ptr), size_(size), offset_(offset) {
Block(void* ptr, size_t size, size_t offset = 0, Device* ptrDevice = nullptr)
: data_(ptr), size_(size), offset_(offset), ptrDevice_(ptrDevice) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr_device_

///SwapGPU
struct onePieceMsg{
/*
members: [ptr, size, MallocFree, idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls make the names consistent: MallocFree -> malloc_free

int MallocFree;
int idx;
double t;
onePieceMsg(string p, size_t s, int M, int i):ptr(p),size(s),MallocFree(M),idx(i){}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const string &p

@@ -64,6 +65,11 @@ class Device {

/// Called by Tensor.
void FreeBlock(Block* block);

void AppendInfo(string blockInfo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any comments on the blockInfo? better give an example.

@@ -102,6 +108,8 @@ class Device {

int id() const { return id_; }

virtual void* GetRealGpuPtr(const Block* block_) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by real gpu ptr?

///SwapGPU
struct onePieceMsg{
/*
members: [ptr, size, operation_type, idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr to CPU or GPU memory?

map<int,std::tuple<int,int,int,int>>Table_sched; // changed to with sync_r_idx

//vec_block
vector<string>vec_block; //iteration 0-3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comments cannot explain the code.

vector<string> vec;
vector<string> vec_block_RW;
vector<string> vec_block_RWMF;
map<int,int>Table_r2d; //full duration info, cross-iteration duration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the names Table_r2d and Table_d2r are not descriptive..

ptr_device_->AppendInfo(temp);
}
if (data_ == nullptr) {
cout<<"before GetRealGpuPtr, block_ and data_: "<<this<<' '<<data_<<endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cout??
It will dump too many prints on the screen..

};


struct oneIterMsg{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kind of names is not descriptive.

//vector of pairMsg is used in run.
//vector of iterMsg is used in test.

vector<onePieceMsg> swap_strVec_2_pieceMsgVec(vector<string> vec, int &idxRange){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't mix different naming styles..

idxRange = static_cast<int>(onePieceMsgVec_.size());

return onePieceMsgVec_;
}// end of strVec_2_pieceMsgVec function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is meaningless..
better add some comments for the functionality or the arguments.

@junzhezhang
Copy link
Author

Updated the PR as per required on 09 Nov meeting, focused on correctness and code readability.

Copy link
Member

@nudles nudles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can you separate the Pool and Swap+Pool into 2 pull requests?
  2. please replace the strings with structs.

elif args.model == 'vgg':
train_x, test_x = normalize_for_vgg(train_x, test_x)
net = vgg.create_net(args.use_cpu)
train((train_x, train_y, test_x, test_y), net, 250, vgg_lr, 0.0005,
use_cpu=args.use_cpu)
use_cpu=args.use_cpu,batch_size=args.batch_size)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, it would be better to keep the original example code.

@@ -24,7 +24,7 @@
#include <atomic>
#include <memory>
#include "singa/utils/logging.h"

#include <string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used?

@@ -64,6 +65,9 @@ class Device {

/// Called by Tensor.
void FreeBlock(Block* block);

void AppendInfo(string block_info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, please use a structure or class for block_info instead of string.

float mem_limit_ratio = 0.70;
size_t smallest_block = 1<<20; //1 MB
int data_buffer = 4; // used to control readyIdx
int mutable_data_buffer = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does 4 and 6 mean?

//Append block info: opt_type, ptr, time_stamp
if (ptr_device_!=nullptr){
//Append info.
stringstream strm2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code would be cleaner if the strings are replaced with struct.

int name;
size_t size;
int r_idx;
int d_idx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are r_idx, d_idx and name?

@junzhezhang
Copy link
Author

Replaced the strings with structs in Append function as per requested, updated in branch vd2.

New PR could not be created to Apache master, so it was created to my forked repo.

For the other request, separation of Pool and Swap+Pool into 2 PR is not possible from git, as they were updated mixed and match. But they are well separated in different classes of Device and Memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants