Skip to content
This repository was archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] Slingshot NEXRAD dataset for team Kernelogic #398

Closed
kernelogic opened this issue Jun 8, 2022 · 44 comments
Closed
Assignees

Comments

@kernelogic
Copy link

kernelogic commented Jun 8, 2022

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

  • Organization Name: Fei Yan - Kernelogic
  • Website / Social Media: https://slingshot.kernelogic.ca/ @feiya200
  • Total amount of DataCap being requested (between 500 TiB and 5 PiB): 5 PiB
  • Weekly allocation of DataCap requested (usually between 1-100TiB): 500 TiB
  • On-chain address for first allocation: f1ioosn6kwao6q34lxs4twrycjqeeir4pv3qh5cci

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/60
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/59
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/46
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/297
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/298
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/304

What is the primary source of funding for this project?

I am hoping to utilize this datacap at no cost, on active community providers on Slack, especially SPX / enterprise-sp-wg members. 

I expect to receive prize from Slingshot as return of my effort as I need to spend a lot time preparing dataset, bandwidth, hardware and communicating with providers.

What other projects/ecosystem stakeholders is this project associated with?

Slingshot, enterprise-sp-wp.

Use-case details

Describe the data being stored onto Filecoin

Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.

Where was the data in this dataset sourced from?

https://registry.opendata.aws/noaa-nexrad/

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

The data is primarily compressed binary data. Below site demonstrate how to consume and render the data
https://nbviewer.org/gist/dopplershift/356f2e14832e9b676207

s3://noaa-nexrad-level2/2021/01/01/TSDF/TSDF20210101_235417_V08

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

AWS open dataset

What is the expected retrieval frequency for this data?

I am expecting some retrievals during prize judging period, as well as anyone interested in downloading this dataset.

For how long do you plan to keep this dataset stored on Filecoin?

As slingshot rule, minimum 1 year. Most likely 520 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

All regions.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I will upload my prepared CAR files to a web server and coordinate with providers to download and propose offline deals.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

I plan to deal with SPX, approved slingshot restore SPs and enterprise-sp-wg members, as well as any real community providers who are interested.

To name a few from the community that I deal with regularly: PIKNIK, Holon, CabrinaHuang, HarryM, BigBear, j1v, XinAn Xu, WillTechMusing.

How will you be distributing deals across storage providers?

Evenly across all providers I propose to, if they can handle. If a miner is a notary itself, this notary will receive no more than 10% of the total granted datacap.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

I have all I need to start making deals.
@large-datacap-requests
Copy link

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@large-datacap-requests
Copy link

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@galen-mcandrew
Copy link
Collaborator

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1ioosn6kwao6q34lxs4twrycjqeeir4pv3qh5cci

@large-datacap-requests
Copy link

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1ioosn6kwao6q34lxs4twrycjqeeir4pv3qh5cci

DataCap allocation requested

250TiB

Copy link

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceca3ix43am6jjlgst75fftqfsif3r4coiiioxvltgatlf6cf5veyy

Address

f1ioosn6kwao6q34lxs4twrycjqeeir4pv3qh5cci

Datacap Allocated

250.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceca3ix43am6jjlgst75fftqfsif3r4coiiioxvltgatlf6cf5veyy

@raghavrmadya
Copy link
Collaborator

raghavrmadya commented Aug 30, 2022

Duplicate proposal canceled.

@large-datacap-requests
Copy link

Deleting comment

@raghavrmadya hasn't the permissions to post this comment.

Please, contact the assignee of this issue.

@large-datacap-requests large-datacap-requests bot deleted a comment from raghavrmadya Aug 30, 2022
@raghavrmadya
Copy link
Collaborator

@cryptowhizzard Can you please propose again?

@raghavrmadya raghavrmadya self-assigned this Aug 30, 2022
@raghavrmadya
Copy link
Collaborator

DataCap Allocation requested

Request number 5

Multisig Notary address

f01858410

Client address

f1ioosn6kwao6q34lxs4twrycjqeeir4pv3qh5cci

DataCap allocation requested

1.34PiB

Copy link

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea5bxyrrndhyvqk72hapvepjy3lsaedc7bxswz5ezgcxz6xry24xm

Address

f1ioosn6kwao6q34lxs4twrycjqeeir4pv3qh5cci

Datacap Allocated

1.34PiB

Signer Address

f1fg6jkxsr3twfnyhdlatmq36xca6sshptscds7xa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea5bxyrrndhyvqk72hapvepjy3lsaedc7bxswz5ezgcxz6xry24xm

@kernelogic
Copy link
Author

Closing this issue as it's all allocated. Thanks everyone for helping with it.

@filplus-checker
Copy link

DataCap and CID Checker Report1

  • Organization: Fei Yan - Kernelogic
  • Client: f1ioosn6kwao6q34lxs4twrycjqeeir4pv3qh5cci

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

  • Storage provider should not exceed 25% of total datacap.
  • Storage provider should not be storing duplicate data for more than 20%.
  • Storage provider should have published its public IP address.
  • All storage providers should be located in different regions.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01882177new Singapore, Singapore, SG 354.84 TiB 6.99% 327.59 TiB 7.68%
f01878005new Singapore, Singapore, SG 354.78 TiB 6.99% 327.53 TiB 7.68%
f01882184new Singapore, Singapore, SG 354.38 TiB 6.98% 327.13 TiB 7.69%
f01880047new Singapore, Singapore, SG 354.16 TiB 6.98% 326.94 TiB 7.69%
f01882035 Chengdu, Sichuan, CN 330.28 TiB 6.51% 330.28 TiB 0.00%
f0143858 Clifton, New Jersey, US 327.94 TiB 6.46% 327.94 TiB 0.00%
f03223 San Jose, California, US 319.59 TiB 6.30% 319.59 TiB 0.00%
f0240185 Clifton, New Jersey, US 305.41 TiB 6.02% 305.41 TiB 0.00%
f01877571new Singapore, Singapore, SG 298.22 TiB 5.87% 271.06 TiB 9.11%
f02301 San Jose, California, US 248.91 TiB 4.90% 248.91 TiB 0.00%
f01852023 Busan, Busan, KR 206.16 TiB 4.06% 206.16 TiB 0.00%
f01851482 Busan, Busan, KR 201.94 TiB 3.98% 201.94 TiB 0.00%
f01402814 Singapore, Singapore, SG 186.53 TiB 3.67% 186.53 TiB 0.00%
f01918045 Kuala Lumpur, Kuala Lumpur, MY 168.72 TiB 3.32% 168.72 TiB 0.00%
f01918046 Kuala Lumpur, Kuala Lumpur, MY 168.72 TiB 3.32% 168.72 TiB 0.00%
f01909705 Kuala Lumpur, Kuala Lumpur, MY 168.72 TiB 3.32% 168.72 TiB 0.00%
f01924827 Hong Kong, Central and Western, HK 165.53 TiB 3.26% 165.53 TiB 0.00%
f01863339 Shanghai, Shanghai, CN 112.00 TiB 2.21% 112.00 TiB 0.00%
f01852325 Hong Kong, Central and Western, HK 75.00 TiB 1.48% 75.00 TiB 0.00%
f01850141 Hong Kong, Central and Western, HK 75.00 TiB 1.48% 75.00 TiB 0.00%
f034258 Chengdu, Sichuan, CN 74.91 TiB 1.48% 74.91 TiB 0.00%
f01652333 Sunnyvale, California, US 72.22 TiB 1.42% 72.22 TiB 0.00%
f01119939 Dallas, Texas, US 56.22 TiB 1.11% 56.22 TiB 0.00%
f01864434 Sydney, New South Wales, AU 36.72 TiB 0.72% 36.72 TiB 0.00%
f01702940 Dallas, Texas, US 36.28 TiB 0.71% 36.28 TiB 0.00%
f01225882 Burnaby, British Columbia, CA 18.72 TiB 0.37% 18.72 TiB 0.00%
f01662356 Ashburn, Virginia, US 2.41 TiB 0.05% 2.41 TiB 0.00%
f01873432 Las Vegas, Nevada, US 1.97 TiB 0.04% 1.97 TiB 0.00%
f01879914new San Jose, California, US 32.00 GiB 0.00% 32.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

  • No more than 25% of unique data are stored with less than 4 providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
2.69 TiB 13.44 TiB 5 0.26%
32.00 GiB 192.00 GiB 6 0.00%
416.00 GiB 2.84 TiB 7 0.06%
1.00 TiB 8.47 TiB 8 0.17%
9.53 TiB 89.22 TiB 9 1.76%
30.63 TiB 396.09 TiB 10 7.80%
25.50 TiB 280.66 TiB 11 5.53%
74.22 TiB 890.63 TiB 12 17.54%
26.38 TiB 345.34 TiB 13 6.80%
26.41 TiB 378.88 TiB 14 7.46%
47.13 TiB 708.13 TiB 15 13.95%
37.69 TiB 612.84 TiB 16 12.07%
61.00 TiB 1.03 PiB 17 20.81%
16.25 TiB 292.50 TiB 18 5.76%
32.00 GiB 608.00 GiB 19 0.01%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients.
Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f154a4iq5mxq76avoooc5a3unchfbrjg7itkjfl6y Fei Yan - Kernelogic 33.00 TiB 1,056 LDN v3 multisig

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

15 participants