memory management: swap+pool #412

junzhezhang · 2018-09-19T03:17:50Z

new class of pool: SwapPool
important APIs: PoolOpt(), Malloc(), Free()
PoolOpt() takes in M/F seq including those induced by swapping
cross-iteration variables and last iteration case solved.

delay swap_plan() by 3 more iterations update train correct swap_sched(), swap_select(),swap_plan() correct load update in swap_select vec_run changed to new 3 iterations correct vec_run36 index issue correct overhead issue, verify vec_run.t vec_run duplicate to avoid sorting issue

verified itm 5 indices in Table_sched vec_swap_select pass by reference in swap_sched() impl swap_update_tables(), before DeploySwap(), both at Append() for time being, remove negative r_idx itms && git push origin vd1 handle last itr by impl sizeSqn and verification to change asyncSwapFlag back to 0

correct swap_construct_tables(), included negative r_idx for swap_update_tables() and DeploySwap() include negative r_idx for DeploySwap() impl GetRealGpuPtr() to swapIn nullptr Block at last iteration impl GetRealGpuPtr(), and optimize data() and mutable_data() impl GetRealGpuPtr(), and optimize data() and mutable_data() verify const issue change to return tempData instead of updating data_ without remove erasing in Table_not_at_device milestone of last itr, at 550 MB

new class of pool: SwapPool important APIs: PoolOpt(), Malloc(), Free() PoolOpt() takes in M/F sequences including those induced by swapping cross-iteration variables and last iteration case solved. record down MF after swap done, for one iteration

nudles · 2018-10-03T01:51:15Z

CMakeLists.txt

@@ -89,7 +89,7 @@ SET(EXECUTABLE_OUTPUT_PATH ${PROJECT_BINARY_DIR}/bin)
 IF (USE_CUDA)
    include(ExternalProject)
    ExternalProject_Add(cnmem
-        GIT_REPOSITORY "https://github.com/nusdbsystem/cnmem.git"
+        GIT_REPOSITORY "https://github.com/junzhezhang/cnmem.git"


did you change cnmem source code?

nudles · 2018-10-03T01:51:53Z

examples/cifar10/train.py

@@ -31,24 +31,25 @@
 import os


pls keep the cnn.py (instead of alexnet.py)

I think you don't need to change the example model code.

nudles · 2018-10-03T01:55:08Z

include/singa/core/common.h

-  Block(void* ptr, size_t size, size_t offset = 0)
-      : data_(ptr), size_(size), offset_(offset) {
+  Block(void* ptr, size_t size, size_t offset = 0, Device* ptrDevice = nullptr)
+      : data_(ptr), size_(size), offset_(offset), ptrDevice_(ptrDevice) {


ptr_device_

nudles · 2018-10-03T01:57:58Z

include/singa/core/device.h

+///SwapGPU
+struct onePieceMsg{
+    /*
+     members: [ptr, size, MallocFree, idx]


pls make the names consistent: MallocFree -> malloc_free

nudles · 2018-10-03T01:58:17Z

include/singa/core/device.h

+    int MallocFree;
+    int idx;
+    double t;
+    onePieceMsg(string p, size_t s, int M, int i):ptr(p),size(s),MallocFree(M),idx(i){}


const string &p

nudles · 2018-11-07T08:58:13Z

include/singa/core/device.h

@@ -64,6 +65,11 @@ class Device {

  /// Called by Tensor.
  void FreeBlock(Block* block);
+
+  void AppendInfo(string blockInfo);


any comments on the blockInfo? better give an example.

nudles · 2018-11-07T08:58:34Z

include/singa/core/device.h

@@ -102,6 +108,8 @@ class Device {

  int id() const { return id_; }

+  virtual void* GetRealGpuPtr(const Block* block_) = 0;


what do you mean by real gpu ptr?

nudles · 2018-11-07T08:59:35Z

include/singa/core/device.h

+///SwapGPU
+struct onePieceMsg{
+    /*
+     members: [ptr, size, operation_type, idx]


ptr to CPU or GPU memory?

nudles · 2018-11-07T09:02:24Z

include/singa/core/device.h

+  map<int,std::tuple<int,int,int,int>>Table_sched; // changed to with sync_r_idx
+
+  //vec_block
+  vector<string>vec_block; //iteration 0-3


the comments cannot explain the code.

nudles · 2018-11-07T09:04:01Z

include/singa/core/memory.h

+    vector<string> vec;
+    vector<string> vec_block_RW;
+    vector<string> vec_block_RWMF;
+    map<int,int>Table_r2d; //full duration info, cross-iteration duration.


the names Table_r2d and Table_d2r are not descriptive..

nudles · 2018-11-07T09:07:35Z

src/core/common/common.cc

+      ptr_device_->AppendInfo(temp);
+    }
+    if (data_ == nullptr) {
+      cout<<"before GetRealGpuPtr, block_ and data_: "<<this<<' '<<data_<<endl;


cout??
It will dump too many prints on the screen..

nudles · 2018-11-07T09:08:55Z

src/core/device/swap_gpu.cc

+};
+
+
+struct oneIterMsg{


this kind of names is not descriptive.

nudles · 2018-11-07T09:09:23Z

src/core/device/swap_gpu.cc

+//vector of pairMsg is used in run.
+//vector of iterMsg is used in test.
+
+vector<onePieceMsg> swap_strVec_2_pieceMsgVec(vector<string> vec, int &idxRange){


don't mix different naming styles..

nudles · 2018-11-07T09:10:22Z

src/core/device/swap_gpu.cc

+    idxRange = static_cast<int>(onePieceMsgVec_.size());
+
+    return onePieceMsgVec_;
+}// end of strVec_2_pieceMsgVec function


this comment is meaningless..
better add some comments for the functionality or the arguments.

junzhezhang · 2018-11-19T17:21:27Z

Updated the PR as per required on 09 Nov meeting, focused on correctness and code readability.

nudles

can you separate the Pool and Swap+Pool into 2 pull requests?
please replace the strings with structs.

nudles · 2019-01-15T09:01:19Z

examples/cifar10/train.py

    elif args.model == 'vgg':
        train_x, test_x = normalize_for_vgg(train_x, test_x)
        net = vgg.create_net(args.use_cpu)
        train((train_x, train_y, test_x, test_y), net, 250, vgg_lr, 0.0005,
-              use_cpu=args.use_cpu)
+              use_cpu=args.use_cpu,batch_size=args.batch_size)
    else:


again, it would be better to keep the original example code.

nudles · 2019-01-15T09:02:36Z

include/singa/core/common.h

@@ -24,7 +24,7 @@
 #include <atomic>
 #include <memory>
 #include "singa/utils/logging.h"
-
+#include <string>


nudles · 2019-01-15T09:03:22Z

include/singa/core/device.h

@@ -64,6 +65,9 @@ class Device {

  /// Called by Tensor.
  void FreeBlock(Block* block);
+
+  void AppendInfo(string block_info);


as discussed, please use a structure or class for block_info instead of string.

nudles · 2019-01-15T09:05:46Z

include/singa/core/device.h

+  float mem_limit_ratio = 0.70; 
+  size_t smallest_block = 1<<20; //1 MB
+  int data_buffer = 4; // used to control readyIdx
+  int mutable_data_buffer = 6;


what does 4 and 6 mean?

nudles · 2019-01-15T09:08:09Z

src/core/common/common.cc

+    //Append block info: opt_type, ptr, time_stamp
+    if (ptr_device_!=nullptr){
+      //Append info.
+      stringstream strm2;


the code would be cleaner if the strings are replaced with struct.

nudles · 2019-01-15T09:10:44Z

src/core/memory/memory.cc

+    int name;
+    size_t size;
+    int r_idx;
+    int d_idx;


what are r_idx, d_idx and name?

junzhezhang · 2019-01-20T14:22:16Z

Replaced the strings with structs in Append function as per requested, updated in branch vd2.

New PR could not be created to Apache master, so it was created to my forked repo.

For the other request, separation of Pool and Swap+Pool into 2 PR is not possible from git, as they were updated mixed and match. But they are well separated in different classes of Device and Memory

junzhezhang added 17 commits August 13, 2018 15:11

change / into * reverse number

e7f695d

align train.py with vc12

2cc6ba4

revert back train.py

611ad3c

revert back train.py

9974d5d

update train iteration number

7309dc0

change cnmem src, common.h, common.cc and the cmakelist

25fe79d

disable common.cc appendInfo, for device src done first.

fecf34f

update device and memory family src

c07dcc6

unable common.cc appendInfo

8dae4d8

correct swap_select()

572fe4d

correct swap_select()

d2027a9

enable swap_plan()

9d84cfc

enable include negative r_idx into Table_sched

479ad2a

vd2: swap+pool

2fb3f02

new class of pool: SwapPool important APIs: PoolOpt(), Malloc(), Free() PoolOpt() takes in M/F sequences including those induced by swapping cross-iteration variables and last iteration case solved. record down MF after swap done, for one iteration

nudles reviewed Oct 3, 2018

View reviewed changes

nudles reviewed Nov 7, 2018

View reviewed changes

add documentation

0f3722d

junzhezhang force-pushed the vd2 branch from 2057e41 to 0f3722d Compare November 19, 2018 17:10

nudles requested changes Jan 15, 2019

View reviewed changes

Replace the strings with struct for function Append.

8e8a7e1

junzhezhang force-pushed the vd2 branch from 6fae4e6 to 8e8a7e1 Compare January 20, 2019 13:57

junzhezhang mentioned this pull request Jan 20, 2019

Replace the strings with structs junzhezhang/incubator-singa#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory management: swap+pool #412

memory management: swap+pool #412

junzhezhang commented Sep 19, 2018

nudles Oct 3, 2018

nudles Oct 3, 2018

nudles Oct 3, 2018

nudles Oct 3, 2018

nudles Oct 3, 2018

nudles Oct 3, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

nudles Nov 7, 2018

junzhezhang commented Nov 19, 2018

nudles left a comment

nudles Jan 15, 2019

nudles Jan 15, 2019

nudles Jan 15, 2019

nudles Jan 15, 2019

nudles Jan 15, 2019

nudles Jan 15, 2019

junzhezhang commented Jan 20, 2019

		@@ -102,6 +108,8 @@ class Device {

		int id() const { return id_; }

		virtual void* GetRealGpuPtr(const Block* block_) = 0;

memory management: swap+pool #412

Are you sure you want to change the base?

memory management: swap+pool #412

Conversation

junzhezhang commented Sep 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junzhezhang commented Nov 19, 2018

nudles left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junzhezhang commented Jan 20, 2019