Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3rd Community Review of NonEntropy Allocator #228

Open
joshua-ne opened this issue Nov 19, 2024 · 1 comment
Open

3rd Community Review of NonEntropy Allocator #228

joshua-ne opened this issue Nov 19, 2024 · 1 comment
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@joshua-ne
Copy link

joshua-ne commented Nov 19, 2024

Allocator Application: filecoin-project/notary-governance#1022
First Community Diligence: #63
Second Community Diligence: #148
Allocator Report: https://compliance.allocator.tech/report/f03018029/1731979795/report.md

Since the last community diligence and we got a refresh of 5 PiB Datacap, the following are the updates:

  1. New Client
    We have served two more clients: [DataCap Application] <Xenogic> - <Xenogic027> joshua-ne/FIL_DC_Allocator_1022#34, [DataCap Application] <Byte Tunneling> - <ByteTunneling_data_store_bc_fil_01> joshua-ne/FIL_DC_Allocator_1022#39, and [DataCap Application] Pangeo Community joshua-ne/FIL_DC_Allocator_1022#41

  2. Progressive Allocation
    We continue to manage the Clients' application and approval process in accordance with the guidelines of DataCap governance. We always start with a small amount, building trust by checking data distribution and SP retrieval rate. For some of the suspicious behavior, we adjusted a little bit, for example, with [DataCap Application] <Byte Tunneling> - <ByteTunneling_data_store_bc_fil_01> joshua-ne/FIL_DC_Allocator_1022#39, we allocated two consecutive 50TiB to begin with, since they did not update their change of SPs on time.

  3. Data Distribution
    Our Clients allocate DataCapto different SPs in reasonable proportions and ensure that SPs do not store duplicate CIDs.

  4. Emphasis on Data Retrieval
    We have been using both SPARK and our own retrieval tool, which is now OPEN SOURCE (and we have introduced this new efficient tool to the community at a meeting, https://www.youtube.com/watch?v=XQlyGV4N_y8, starting 44:10) and which we believe is more suitable for allocators' usage. For example, we have seen SPs taking advantage of the SPARK system by selectively broadcasting deals, say only 10% or even 1% of them, but still get a good retrieval rate. However, this method will not pass our checking, since we check on the deals made between specific clients and SPs with given DataCids of specific batches. We always check before we approve, and we have rejected several signings due to the fact that they do not perform well on either SPARK or our own retrieval system.

  5. Dataset Card Initiative to Make Data More Valuable
    To help the community better use the data onboarded on Fil+, we have started a new initiative called Dataset Card, [Dataset_Card]: <Sample Dataset> joshua-ne/FIL_DC_Allocator_1022_Dataset_Card#1. We are testing and practicing on our past and ongoing clients. With this mechanism established, the community will have much better access to the real data on Filecoin which may lead to easier development on Filecoin.

As we reach out to more clients and SPs, we understand more about their needs and challenges in onboarding data to Filecoin network. From what we've seen, the majority of them are willing to store and keep the unsealed copy of the deal data, it is just we need to do more work to make the onboarding and retrieval easier.

As our operations expand, we are collaborating with an increasing number of clients, which has led to a faster utilization of quotas and shorter application cycles. Therefore, we hope to secure a higher quota in this round. We deeply value the principles of FIL+ and have consistently been actively involved in the FIL+ community. We have contributed tools such as retrieval platforms and CID databases, making these mature tools open-source for the community. With this commitment, we will continue to handle and utilize the granted quota responsibly and prudently.

@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Governance/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Nov 19, 2024
@filecoin-watchdog
Copy link
Collaborator

@joshua-ne
Allocator Application
Compliance Report
1st Review
2nd Review
1st Review score: 5PiB
2nd Review score: 10PiB

8.5 PiB granted to existing clients:

Client Name DC status
US. General Services Administration 3PiB Existing
EuropeanOrganization for Nuclear Research 868TiB New
PangeoCommunity 1.9PiB New
GoogleDeepMind 2.8PiB New

US General Services Administration

  • The client declared 5 replicas; why are there 9?
  • Replication is suboptimal: 48.91% of deals are for data replicated across fewer than 4 storage providers.
  • According to the Spark retrieval report, 6 SPs have retrieval rates that are too low.

European Organization for Nuclear Research

  • Did the allocator perform KYC?
  • The allocator is responsible for updating the SP list and ensuring the client does so as well.
  • The client declared 5 replicas, but 7 have already been made.
  • According to the Spark retrieval report, 2 SPs have retrieval rates below 30%.

Pangeo Community

Google DeepMind

  • Did the allocator perform KYC?
  • The client requested 10PiB but claims the dataset is 2PiB, with only 4 replicas planned, which totals 8PiB. This is inconsistent.
  • All SPs match the provided list.
  • The client declared 4 replicas, but 7 have already been made.
  • Why 2nd allocation was 256TiB instead of 100TiB?

  • The allocator ensures that the SP list is always up-to-date and reflects the latest information in the form.
  • There is no clear evidence of KYC being performed; please clarify this matter.
  • The allocator consistently monitors client retrieval using both proprietary tools and common reports. This demonstrates the allocator's strong commitment to thoroughly vet its clients and maintain high standards for the network.
  • The allocator has introduced an additional solution for its clients in the form of a dataset card. Dear @joshua-ne, could you provide more information about this solution? What is its intended purpose, and where are the responses from clients who complete this form stored? I would appreciate more details on this.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Governance/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

2 participants