diff --git a/recognition/partial_fc/README.md b/recognition/partial_fc/README.md index 713e28bfd..7dccd701d 100644 --- a/recognition/partial_fc/README.md +++ b/recognition/partial_fc/README.md @@ -8,7 +8,7 @@ Partial FC is a distributed deep learning training framework for face recognitio [Partial FC](https://arxiv.org/abs/2010.05222) - [Largest Face Recognition Dataset: **Glint360k**](#Glint360K) - [Docker](#Docker) -- [Performance On Million Identities](#Benchmark) +- [Performance On Million Identities](#Benchmark) - [FAQ](#FAQ) - [Citation](#Citation) @@ -67,7 +67,7 @@ Use [unpack_glint360k.py](./unpack_glint360k.py) to unpack. - [x] [**Baidu Drive**](https://pan.baidu.com/s/1sd9ZRsV2c_dWHW84kz1P1Q) (code:befi) - [x] [**Google Drive**](https://drive.google.com/drive/folders/1WLjDzEs1wC1K1jxDHNJ7dhEmQ3rOOILl?usp=sharing) -| Framework | backbone | sample_rate | IJBC@e4 | IFRT@e6 | +| Framework | backbone | negative class centers sample_rate | IJBC@e4 | IFRT@e6 | | :--- | :--- | :--- | :--- | :--- | | mxnet | [R100](https://drive.google.com/drive/folders/1YPqIkOZWrmbli4GWfMJO2b0yiiZ7UCsP?usp=sharing) |1.0|97.3|-| | mxnet | [R100](https://drive.google.com/drive/folders/1-gF5sDwNoRcjwmpPSTNLpaZJi5N91BvL?usp=sharing) |0.1|97.3|-| diff --git a/recognition/partial_fc/mxnet/README.md b/recognition/partial_fc/mxnet/README.md index 383257a9b..df48d919f 100644 --- a/recognition/partial_fc/mxnet/README.md +++ b/recognition/partial_fc/mxnet/README.md @@ -1,4 +1,30 @@ -## [中文版本请点击这里](./README_CN.md) +## Speed Up Training +![Image text](https://github.com/nttstar/insightface-resources/blob/master/images/partial_fc.png) + +### 1. The classification layer model is parallel +Class centers are evenly distributed across different GPUs. It only takes three communications to complete +loss-free Softmax calculations. + +#### 1. Synchronization of features +Make sure each GPU has all the GPU features on it, as is shown in `AllGather(x_i)`. + +#### 2. Synchronization of denominator of the softmax function +We can first calculate the local sum of each GPU, and then compute the global sum through communication, as is shown +in `Allreduce(sum(exp(logits_i)))` + +#### 3. Synchronization the gradients of feature +The gradient of logits can be calculated independently, so is the gradient of the feature. finally, we collect all the +gradients on GPU and send them back to backbone, as is shown in `Allreduce(deta(X))` + +### 2. Softmax approximate + +Just a subset of class centers can approximate softmax's computation(positive class centers must in these class centers), +this can be done with the following code: +```python +centers_p = func_positive(label) # select the positive class centers by the label of the sample +centers_n = func_negative(centers_p) # negative class centers are randomly sampled after excluding positive classes +centers_final = concat(centers_n, centers_p) # class centers that participate in softmax calculations +``` ## Train ### 1.Requirements diff --git a/recognition/partial_fc/pytorch/README.md b/recognition/partial_fc/pytorch/README.md index 13758f702..a2293b792 100644 --- a/recognition/partial_fc/pytorch/README.md +++ b/recognition/partial_fc/pytorch/README.md @@ -1,15 +1,5 @@ # Parital FC -## TODO - -- [x] **No BUG** Sampling -- [ ] Pytorch Experiments (Glint360k, 1.0/0.1) -- [ ] Mixed precision training -- [ ] Pipeline Parallel -- [ ] Checkpoint -- [ ] Docker -- [ ] A Wonderful Documents - ## Results We employ ResNet100 as the backbone.