Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed: (vals->size()) == (total_val) in KVWorker<Val>::Pull_ #100

Open
SmartAir opened this issue May 8, 2017 · 1 comment
Open

Comments

@SmartAir
Copy link

SmartAir commented May 8, 2017

Hello, I store some push requests from a worker in a list and do not let the server handle these push requests until a certain requirement is satisfied (i.e. not handle the present pull request immediately, block it until a time I set). However, I met an error as below:

[03:19:47] /home/xiongzi/mxnet/dmlc-core/include/dmlc/logging.h:235: [03:19:47] /home/xiongzi/mxnet/ps-lite/include/ps/kv_app.h:579: Check failed: (vals->size()) == (total_val)
terminate called after throwing an instance of 'dmlc::Error'
what(): [03:19:47] /home/xiongzi/mxnet/ps-lite/include/ps/kv_app.h:579: Check failed: (vals->size()) == (total_val)

I feel puzzled about the code that raises the error in int KVWorker<Val>::Pull_:
CHECK_EQ(vals->size(), total_val);

Could someone please explain this sentence of code for me?
vals->size() refers to the the size of values in the present pull request, and the total_val refers to the total size of all values of the pull request's timestamp? (Plz point out my mistake if I says something wrong)
So what is the purpose of checking whether vals->size() and total_val are equal to each other? And what may cause the error I mention above?
Thanks a lot!

For your convenience, if you need more relevant code, the following is the function that raises the error in ps-lite/include/ps/kv_app.h:

template <typename Val>
template <typename C, typename D>
int KVWorker<Val>::Pull_(
    const SArray<Key>& keys, C* vals, D* lens, int cmd, const Callback& cb) {
  int ts = obj_->NewRequest(kServerGroup);
  AddCallback(ts, [this, ts, keys, vals, lens, cb]() mutable {
      mu_.lock();
      auto& kvs = recv_kvs_[ts];
      mu_.unlock();

      // do check
      size_t total_key = 0, total_val = 0;
      for (const auto& s : kvs) {
        Range range = FindRange(keys, s.keys.front(), s.keys.back()+1);
        CHECK_EQ(range.size(), s.keys.size())
            << "unmatched keys size from one server";
        if (lens) CHECK_EQ(s.lens.size(), s.keys.size());
        total_key += s.keys.size();
        total_val += s.vals.size();
      }
      CHECK_EQ(total_key, keys.size()) << "lost some servers?";

      // fill vals and lens
      std::sort(kvs.begin(), kvs.end(), [](
          const KVPairs<Val>& a, const KVPairs<Val>& b) {
                  return a.keys.front() < b.keys.front();
        });
      CHECK_NOTNULL(vals);
      if (vals->empty()) {
        vals->resize(total_val);
      } else {
        CHECK_EQ(vals->size(), total_val);
      }
      Val* p_vals = vals->data();
      int *p_lens = nullptr;
      if (lens) {
        if (lens->empty()) {
          lens->resize(keys.size());
        } else {
          CHECK_EQ(lens->size(), keys.size());
        }
        p_lens = lens->data();
      }
      for (const auto& s : kvs) {
        memcpy(p_vals, s.vals.data(), s.vals.size() * sizeof(Val));
        p_vals += s.vals.size();
        if (p_lens) {
          memcpy(p_lens, s.lens.data(), s.lens.size() * sizeof(int));
          p_lens += s.lens.size();
        }
      }

      mu_.lock();
      recv_kvs_.erase(ts);
      mu_.unlock();
      if (cb) cb();
    });

  KVPairs<Val> kvs; kvs.keys = keys;
  Send(ts, false, cmd, kvs);
  return ts;
}
@SmartAir SmartAir changed the title Check failed: (vals->size()) == (total_val) Check failed: (vals->size()) == (total_val) in 'int KVWorker<Val>::Pull_' May 8, 2017
@SmartAir SmartAir changed the title Check failed: (vals->size()) == (total_val) in 'int KVWorker<Val>::Pull_' Check failed: (vals->size()) == (total_val) in KVWorker<Val>::Pull_ May 8, 2017
@crafet
Copy link

crafet commented Mar 8, 2018

well, I checked the code and try to explain this code
since pull will send keys of range to server node, the callback will check the reponse,
vals is the buffer for storing result, after split kvs, the value will split to ranges, but total count should be the same value.

eric-haibin-lin added a commit to eric-haibin-lin/ps-lite that referenced this issue Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants