Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw/serving community history mvp v2 #164

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

0x-r4bbit
Copy link
Member

This the same as #162 but without all of the comments (#162 is getting too big to load).



## Supported platforms

Creating, distributing, downloading and unpacking community history archives SHOULD only be supported in desktop clients. Mobile clients SHOULD NOT implement this functionality due to bandwidth and storage limitations.
Creating, distributing, downloading and unpacking community history archives SHOULD only be supported in Status desktop clients. Status mobile clients SHOULD NOT implement this functionality due to bandwidth and storage limitations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to: Status mobile are out of scope for the MVP of this service. However we may consider bringing this service to Status Mobile in the future.

Thinking behind this comment: in many countries mobile bandwidth is plentiful, very fast, and very cheep, for example in Ukraine it costs about $5 per month for unlimited mobile data, and the speed of this mobile data is normally between 50 and 100 megabits per second (faster than wired ADSL in London!). In the US mobile data is both expensive and slow, but for folks who are in countries with fast a cheep mobile data, mobile data usage isn't a worry.

Re. storage, in 64gb seems to be becoming the storage baseline these days, with the latest top of the line phones now offering 1TB storage(!!!). So storage will probably not be a limitation on mobile in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change that!

2. Community owner enables community history archive support
3. A special type of channel for distributing magnet links ([Magnet URI scheme](https://en.wikipedia.org/wiki/Magnet_URI_scheme), [Extensions for Peers to Send Metadata Files](https://www.bittorrent.org/beps/bep_0009.html)) is created
1. Community owner creates a Status community (previously known as [org channels](https://github.com/status-im/specs/pull/151))
2. Community owner enables community history archive support (possibly on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))
Copy link

@John-44 John-44 Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change to "Community owner doesn't disable community history archive support"

Remove word "possibly", target is on by default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

3. A special type of channel for distributing magnet links ([Magnet URI scheme](https://en.wikipedia.org/wiki/Magnet_URI_scheme), [Extensions for Peers to Send Metadata Files](https://www.bittorrent.org/beps/bep_0009.html)) is created
1. Community owner creates a Status community (previously known as [org channels](https://github.com/status-im/specs/pull/151))
2. Community owner enables community history archive support (possibly on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))
3. A special type of channel to exchange metadata about the archival data is created
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: Note this channel is not visible in the UI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add this. Reason I didn't mentioned anything about it being hidden is because this is a spec for the protocol, which should be UI agnostic.


## Storing live messages

Community owner nodes MUST store live messages as [14/WAKU2-MESSAGE](https://rfc.vac.dev/spec/14/). This is required to provide confidentiality, authenticity, and integrity of message data distributed via the BitTorrent layer, and later validated by Status nodes when they unpack message history archives.

Community owner nodes SHOULD remove those messages from their local databases after they have been turned into archives and distributed to the BitTorrent network.
Community owner nodes SHOULD remove those messages from their local databases once they are older than 30 days and after they have been turned into message archives and distributed to the BitTorrent network.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is correct? If a community owner node removes all messages older than 30 days from it's local database, how will the community owner be able to access messages older than 30 days using the Status desktop UI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we had this same discussion in the other PR :D

See: #162 (comment)

This is about removing the Waku messages, not the application messages.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh yes, thanks for reminding me! Ignore this comment ;-)


Community owner nodes SHOULD store these files in a dedicated folder that is identifiable via the community id.

Community owner nodes MAY remove older torrent files that were generated for previous message history archives, after a new torrent was created.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the removing of the previous (old) torrent file be automatic after the new torrent file is created? Is there any reason not to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. The new/latest torrent file will replace the current one, making this paragraph obsolete.



## Supported platforms

Creating, distributing, downloading and unpacking community history archives SHOULD only be supported in desktop clients. Mobile clients SHOULD NOT implement this functionality due to bandwidth and storage limitations.
Creating, distributing, downloading and unpacking community history archives SHOULD only be supported in Status desktop clients. Status mobile clients SHOULD NOT implement this functionality due to bandwidth and storage limitations.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change that!

2. Community owner enables community history archive support
3. A special type of channel for distributing magnet links ([Magnet URI scheme](https://en.wikipedia.org/wiki/Magnet_URI_scheme), [Extensions for Peers to Send Metadata Files](https://www.bittorrent.org/beps/bep_0009.html)) is created
1. Community owner creates a Status community (previously known as [org channels](https://github.com/status-im/specs/pull/151))
2. Community owner enables community history archive support (possibly on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

3. A special type of channel for distributing magnet links ([Magnet URI scheme](https://en.wikipedia.org/wiki/Magnet_URI_scheme), [Extensions for Peers to Send Metadata Files](https://www.bittorrent.org/beps/bep_0009.html)) is created
1. Community owner creates a Status community (previously known as [org channels](https://github.com/status-im/specs/pull/151))
2. Community owner enables community history archive support (possibly on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))
3. A special type of channel to exchange metadata about the archival data is created
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add this. Reason I didn't mentioned anything about it being hidden is because this is a spec for the protocol, which should be UI agnostic.


## Storing live messages

Community owner nodes MUST store live messages as [14/WAKU2-MESSAGE](https://rfc.vac.dev/spec/14/). This is required to provide confidentiality, authenticity, and integrity of message data distributed via the BitTorrent layer, and later validated by Status nodes when they unpack message history archives.

Community owner nodes SHOULD remove those messages from their local databases after they have been turned into archives and distributed to the BitTorrent network.
Community owner nodes SHOULD remove those messages from their local databases once they are older than 30 days and after they have been turned into message archives and distributed to the BitTorrent network.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we had this same discussion in the other PR :D

See: #162 (comment)

This is about removing the Waku messages, not the application messages.


Community owner nodes SHOULD store these files in a dedicated folder that is identifiable via the community id.

Community owner nodes MAY remove older torrent files that were generated for previous message history archives, after a new torrent was created.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. The new/latest torrent file will replace the current one, making this paragraph obsolete.

The community owner node MUST send magnet links containing message archives and the message archive index to a special community channel. The topic of that special channel follows the following format:

```
/{application-name}/{version-of-the-application}/{content-topic-name}/{encoding}
Copy link
Member Author

@0x-r4bbit 0x-r4bbit Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to update this. What content-topic-name should we use here? @staheri14 thoughts?

Copy link

@staheri14 staheri14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Pascal for the spec, overall looks good to me!
I have left some comments and suggestions. Did not get to review the second half of the spec, will do it during the week and leave further comments.


This specification has the following assumptions:

- Store nodes are available 24/7, ensuring constant live message availability

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may add another item "The storage time range limit is 30 days."


### Serving community history archives

Community owner nodes go through the following (high level) process to provide community members with message histories (assumes community owner node is available 24/7):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part

(assumes community owner node is available 24/7):

can be included in the list of assumptions above.

Community owner nodes go through the following (high level) process to provide community members with message histories (assumes community owner node is available 24/7):

1. Community owner creates a Status community (previously known as [org channels](https://github.com/status-im/specs/pull/151))
2. Community owner doesn't disable community history archive support (on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these two terms mean the same thing? if yes, then I suggest to use a consistent term across the spec
"community history archive support" and "message archive capabilities" which is used in the Terminology table

Community owner nodes go through the following (high level) process to provide community members with message histories (assumes community owner node is available 24/7):

1. Community owner creates a Status community (previously known as [org channels](https://github.com/status-im/specs/pull/151))
2. Community owner doesn't disable community history archive support (on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Community owner doesn't disable community history archive support (on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))
2. Community owner enables community history archive support (on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))

1. Community owner creates a Status community (previously known as [org channels](https://github.com/status-im/specs/pull/151))
2. Community owner doesn't disable community history archive support (on by default but can be turned off as well - see [UI feature spec](https://github.com/status-im/feature-specs/pull/36))
3. A special type of channel to exchange metadata about the archival data is created, this channel should not be visible in the user interface
4. Community owner invites members and creates additional channels

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Community owner invites members and creates additional channels
4. Community owner invites community members and creates additional channels visible in the user interface

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course a Community owner doesn't have to create any additional channel, they could have a community that only contains a single channel. Whenever a community is created, by default after it's created it always contains a single channel

The `timestamp` is determined by the context in which the community owner node attempts to create a message history archives as described below:

1. The community owner node attempts to create an archive periodically for the past seven days (including the current day). In this case, the `timestamp` has to lie within those 7 days.
2. The community owner node has been offline (owner node's main process has stopped and needs restart) and attempts to create archives for all the live messages it has missed since it went offline. In this case, the `timestamp` has to lie within the day the latest message was received and the current day.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the whole missed time range be broken into 7-day intervals? I think item 1 already covers it and there is no need for the second item.
I'd suggest adding the explanation of item 2 (how to measure the missed time range) under the second time of this section https://github.com/status-im/specs/blob/c9184b74b5623fe14aa4bd99ed8b268a65984117/docs/raw/serving-community-history.md#serving-archives-for-missed-messages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will update this. I thought it would be good to be explicit about the scenario that the node can be offline, in which case there can be up to 30 days worth of messages from which the node needs to create archives (4, 1 for each 7 days)


## Exporting messages for bundling

Community owner nodes export messages from their local database for creating and bundling history archives using the following criteria:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Community owner nodes export messages from their local database for creating and bundling history archives using the following criteria:
Community owner nodes export Waku messages from their local database for creating and bundling history archives using the following criteria:


Community owner nodes export messages from their local database for creating and bundling history archives using the following criteria:

- Messages to be exported MUST have a `contentTopic` that match any of the topics of the community channels

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Messages to be exported MUST have a `contentTopic` that match any of the topics of the community channels
- Waku messages to be exported MUST have a `contentTopic` that match any of the topics of the community channels

Community owner nodes export messages from their local database for creating and bundling history archives using the following criteria:

- Messages to be exported MUST have a `contentTopic` that match any of the topics of the community channels
- Messages to be exported MUST have a `timestamp` that lies within a time range of 7 days

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Messages to be exported MUST have a `timestamp` that lies within a time range of 7 days
- Waku messages to be exported MUST have a `timestamp` that lies within a time range of 7 days


## Storing live messages

Community owner nodes MUST store live messages as [14/WAKU2-MESSAGE](https://rfc.vac.dev/spec/14/). This is required to provide confidentiality, authenticity, and integrity of message data distributed via the BitTorrent layer, and later validated by Status nodes when they unpack message history archives.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Community owner nodes MUST store live messages as [14/WAKU2-MESSAGE](https://rfc.vac.dev/spec/14/). This is required to provide confidentiality, authenticity, and integrity of message data distributed via the BitTorrent layer, and later validated by Status nodes when they unpack message history archives.
For the archival data serving, community owner nodes MUST store live messages as [14/WAKU2-MESSAGE](https://rfc.vac.dev/spec/14/). This is in addition to their database of application messages. This is required to provide confidentiality, authenticity, and integrity of message data distributed via the BitTorrent layer, and later validated by Status nodes when they unpack message history archives.

Copy link

@staheri14 staheri14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more suggestions and questions

### WakuMessageHistoryArchive

The `from` field SHOULD contain a timestamp of the time range's lower bound.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Its type parallels the `timestamp` field of [`WakuMessage`](https://rfc.vac.dev/spec/14/#payloads).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate what it means for it to "paralell" the field?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has the same type as the timestamp field of the waku message.


The `messages` field MUST contain all messages that belong into the archive given its `from`, `to` and `contentTopic` fields.

The `padding` field MUST contain the amount of zero bytes needed so that the overall byte size of the protobuf encoded `WakuMessageArchive` is a multiple of the `pieceLength` used to divide the message archive data into pieces, as explained in [creating message archive torrents](#creating-message-archive-torrents).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pieces" is unclear at this point, I'd suggest something like below:

The padding field MUST contain the amount of zero bytes needed so that the overall byte size of the protobuf encoded WakuMessageArchive is a multiple of the pieceLength. This is needed for seamless encoding and decoding of archival data in interaction with BitTorrent as explained in creating message archive torrents.


## Message history archive index

Community owner nodes MUST provide message archives for the entire community history. Each individual archive only contains a subset of the complete history, that is, data for a time range of seven days, and all message history archives are concatenated into a single file as byte string (see [Ensuring reproducible data pieces](#ensuring-reproducible-data-pieces)).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Community owner nodes MUST provide message archives for the entire community history. Each individual archive only contains a subset of the complete history, that is, data for a time range of seven days, and all message history archives are concatenated into a single file as byte string (see [Ensuring reproducible data pieces](#ensuring-reproducible-data-pieces)).
Community owner nodes MUST provide message archives for the entire community history. The entire history consists of a set of `WakuMessageArchive`s where each archive contains a subset of historical`WakuMessage`s for a time range of seven days. All the `WakuMessageArchive`s are concatenated into a single file as a byte string (see [Ensuring reproducible data pieces](#ensuring-reproducible-data-pieces)).
Suggested change
Community owner nodes MUST provide message archives for the entire community history. Each individual archive only contains a subset of the complete history, that is, data for a time range of seven days, and all message history archives are concatenated into a single file as byte string (see [Ensuring reproducible data pieces](#ensuring-reproducible-data-pieces)).
Community owner nodes MUST provide message archives for the entire community history. Each individual archive only contains a subset of the complete history, that is, data for a time range of seven days, and all message history archives are concatenated into a single file as byte string (see [Ensuring reproducible data pieces](#ensuring-reproducible-data-pieces)).

- `data` - Contains all protobuf encoded message history archives concatenated in ascending order
- `index` - Contains the protobuf encoded message history archive index

Community owner nodes SHOULD store these files in a dedicated folder that is identifiable via the community id.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Community id is not defined

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean it should be added to the terminology section?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dedicated folder that is identifiable via the community id.

Can you please elaborate on this? the folder should have the same name as the community id?


A torrent's source folder MUST contain the following two files:

- `data` - Contains all protobuf encoded message history archives concatenated in ascending order

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this

Suggested change
- `data` - Contains all protobuf encoded message history archives concatenated in ascending order
- `data` - Contains all protobuf encoded `WakuMessageArchive`s (as bit strings) concatenated in ascending order based on their time

A torrent's source folder MUST contain the following two files:

- `data` - Contains all protobuf encoded message history archives concatenated in ascending order
- `index` - Contains the protobuf encoded message history archive index
Copy link

@staheri14 staheri14 Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `index` - Contains the protobuf encoded message history archive index
- `index` - Contains the protobuf encoded `WakuMessageArchiveIndex`.


The community owner node MUST ensure that the byte string resulting from the protobuf encoded `data` is equal to the byte string `data` from the previously generated message archive torrent, plus the data of the latest 7 days worth of messages encoded as `WakuMessageArchive`. Therefore, the size of `data` grows every seven days as it's append only.

The community owner nodes also MUST ensure that the byte size of every individual `WakuMessageArchive` encoded protobuf is a multiple of `pieceLength: ???` (**TODO**) using the `padding` field. If the last piece of a message archive has fewer bytes than `pieceLength`, it MUST be filled with zero bytes until it has the size `pieceLength`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the piece is not properly introduced yet, I suggest the following edits:

Suggested change
The community owner nodes also MUST ensure that the byte size of every individual `WakuMessageArchive` encoded protobuf is a multiple of `pieceLength: ???` (**TODO**) using the `padding` field. If the last piece of a message archive has fewer bytes than `pieceLength`, it MUST be filled with zero bytes until it has the size `pieceLength`.
The community owner nodes also MUST ensure that the byte size of every individual `WakuMessageArchive` encoded protobuf is a multiple of `pieceLength: ???` (**TODO**) using the `padding` field. If the protobuf encoded 'WakuMessageArchive` is not a multiple of `pieceLength`, its `padding` field MUST be filled with zero bytes and the `WakuMessageArchive` MUST be re-encoded until its size becomes multiple of `pieceLength`.


The community owner nodes also MUST ensure that the byte size of every individual `WakuMessageArchive` encoded protobuf is a multiple of `pieceLength: ???` (**TODO**) using the `padding` field. If the last piece of a message archive has fewer bytes than `pieceLength`, it MUST be filled with zero bytes until it has the size `pieceLength`.

This is necessary because message history archive data will be split into pieces of `pieceLength` when the torrent file is created, and the SHA1 hash of every piece is then stored in the torrent file and later used by other nodes to request the data for each individual data piece.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is necessary because message history archive data will be split into pieces of `pieceLength` when the torrent file is created, and the SHA1 hash of every piece is then stored in the torrent file and later used by other nodes to request the data for each individual data piece.
This is necessary because the content of the `data` file will be split into pieces of `pieceLength` when the torrent file is created, and the SHA1 hash of every piece is then stored in the torrent file and later used by other nodes to request the data for each individual data piece.

docs/raw/serving-community-history.md Show resolved Hide resolved
@0x-r4bbit
Copy link
Member Author

Thanks @staheri14 for all the suggestions!!

Will update the PR accordingly. One thing we need to clarify is what pieceLength should be. This value is usually calculated based on the overall data size. For this spec to work, we need to decide and agree on a pieceLength regardless of the data size.

Any ideas?

@staheri14
Copy link

staheri14 commented Jan 20, 2022

One thing we need to clarify is what pieceLength should be. This value is usually calculated based on the overall data size. For this spec to work, we need to decide and agree on a pieceLength regardless of the data size. Any ideas?

@PascalPrecht

I imagine a smaller piece size is good for resource-limited members to easily get started and to contribute, but the connection overhead and bandwidth usage would be higher (to keep track of pieces and to announce ownerships). In contrast, a larger piece size contributes to lowering connection and bandwidth overhead, though we should be mindful of the resource limit of members.
I'll think about what can be a meaningful pieceLength (although, on the spec level, we may just recommend a value rather than fixing it.)

There might be studies around this topic as well.


This is necessary because message history archive data will be split into pieces of `pieceLength` when the torrent file is created, and the SHA1 hash of every piece is then stored in the torrent file and later used by other nodes to request the data for each individual data piece.

By fitting message archives into a multiple of `pieceLength` and ensuring they fill possible remainding space with zero bytes, community owner nodes prevent the **next** message archive to occupy that remainding space of the last piece, which will result in a different SHA1 hash for that piece.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By fitting message archives into a multiple of `pieceLength` and ensuring they fill possible remainding space with zero bytes, community owner nodes prevent the **next** message archive to occupy that remainding space of the last piece, which will result in a different SHA1 hash for that piece.
By fitting message archives into a multiple of `pieceLength` and ensuring they fill possible remaining space with zero bytes, community owner nodes prevent the **next** message archive to occupy that remaining space of the last piece, which will result in a different SHA1 hash for that piece.

20 // piece[2] SHA1: 0x789
```

The next `WakuMessageArchive` "A3" will be appended ("#3") to the existing data and occupy the remainding space of the third data piece. The piece at index 2 will now produce a different SHA1 hash:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The next `WakuMessageArchive` "A3" will be appended ("#3") to the existing data and occupy the remainding space of the third data piece. The piece at index 2 will now produce a different SHA1 hash:
The next `WakuMessageArchive` "A3" will be appended ("#3") to the existing `data` and occupy the remaining space of the third data piece. The piece at index 2 will now produce a different SHA1 hash:

#3 #3 #3 #3 #3 #3 #3 #3 #3 #3 // piece[3]
```

By filling up the remainding space of the third piece with A2 using its `padding` field, it is guaranteed that its SHA1 will stay the same:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By filling up the remainding space of the third piece with A2 using its `padding` field, it is guaranteed that its SHA1 will stay the same:
By filling up the remaining space of the third piece with A2 using its `padding` field, it is guaranteed that its SHA1 will stay the same:


## Seeding message history archives

The community owner node MUST seed the [generated torrent](#creating-message-archive-torrents) until a new message history archive is created.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The community owner node MUST seed the [generated torrent](#creating-message-archive-torrents) until a new message history archive is created.
The community owner node MUST seed the [generated torrent](#creating-message-archive-torrents) until a new message history archive `WakuMessageArchive` is created.

The community owner node MUST update the `WakuMessageArchiveIndex` every time it creates one or more `WakuMessageArchive`s and bundle it into a new torrent (**TODO: see section**).
For every created `WakuMessageArchive`, there MUST be a `WakuMessageArchiveIndexMetadata` entry in the `archives` field `WakuMessageArchiveIndex`.

## Creating message archive torrents

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of the torrent file is not explained here, isn't it needed?

/{application-name}/{version-of-the-application}/{content-topic-name}/{encoding}
```

All messages sent with this topic MUST be instances of `ApplicationMetadataMessage` ([6/PAYLOADS](/specs/6-payloads)) with a `payload` of `CommunityMessageArchiveIndex`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommunityMessageArchiveIndex is going to be added? could not find it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about the type of message? I think by CommunityMessageArchiveIndex you meant it is the type of ApplicationMetadataMessage?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the magnet link? shouldn't it be published in the channel?


Since the magnet links are created from the community owner node's database (and previously distributed archives), the message history provided by the community owner becomes the canonical message history and single source of truth for the community.

Community member nodes MUST replace messages in their local databases with the messages extracted from archives within the same time range. Messages that didn't receive the community owner node MUST be removed and are no longer part of the message history of interest, even if it already existed in a community member node's database.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Community member nodes MUST replace messages in their local databases with the messages extracted from archives within the same time range. Messages that didn't receive the community owner node MUST be removed and are no longer part of the message history of interest, even if it already existed in a community member node's database.
Community member nodes MUST replace messages in their local databases with the messages extracted from archives within the same time range. Messages that the community owner node didn't receive MUST be removed and are no longer part of the message history of interest, even if it already existed in a community member node's database.


## Fetching message history archives

Generally, fetching message history archives is a tree step process:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Generally, fetching message history archives is a tree step process:
Generally, fetching message history archives is a three step process:


Generally, fetching message history archives is a tree step process:

1. Receive message archive index magnet link as described in [Message archive distribution], download `index` file from torrent, then determine which message archives to download

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest linking to the "Creating message archive torrents" section for this part:

download index file from torrent


Community member nodes subscribe to the special channel that community owner nodes publish magnet links for message history archives to. There are two scenarios in which member nodes can receive such a magnet link message from the special channel:

1. The member node receives it via live messages, that is, messages that are relayed by store nodes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store nodes are known for their storage service as part of which they also relay messages by running waku relay protocol. So, store nodes are a subset of relay nodes but not all of them. Waku relay nodes are the ones that run the relay protocol.
I think it might be better to skip this explanation as it may cause confusion for the readers.

Suggested change
1. The member node receives it via live messages, that is, messages that are relayed by store nodes
1. The member node receives it via live messages by listening to the special channel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants