-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to store metadata about a feed #48
Comments
pros of second file:
One file has the advantage of showing metadata that changes- for instance "added new profile pic on date" or "followed @ on date", if we use syntax that is similar to the non-commented version:
|
I like the idea of doing this in comments at the top of the file. I think the advantages of having everything in the same file outweighs any added complexity when swapping out the files if they get too big or whatever. However, I think we would quickly hit limitations with simple key value system - how would you easily store a list of follows with this for example? A good format could be yaml, I think. Its human readable and writable, and widely supported - we would just need to strip out the comment character at the start of each line before parsing it. I imagine the header for twtxt would then look something like this:
Edit: somehow forgot to add the urls to the user list... |
@tedder consider that if you use a separate file for metadata and it also supports including messages you quickly obsolete the twtxt format as the syndication format of choice. Every client will just use the format that provides more data. Thus, the original twtxt format would be mainly useful as input (like Markdown or ReStructured Text) for scripts generating feeds. |
@erlehmann I said nothing about including messages in a second file. @reednj I like the idea of metadata at the top, instead of happening anywhere in twtxt. I (personally) like yml, it's extensible in cases like this. |
@tedder to demonstrate: Yeah, you do not have to include messages. But any format that is powerful enough to include the metadata can be utilized for that and then you are back at using a single file. I have written a small shell script that converts a twtxt feed to the format described in RFC 4287, which describes how to convey author name/email, contributor name/email, the time of publication and the last update for a document. Since RFC 4287 also describes how to include messages, I just included them! Here is the input file: http://daten.dieweltistgarnichtso.net/tmp/docs/twtxt.txt |
@reednj RFC 5005 describes a mechanism to link together several physical documents that form one logical document. It is not that hard it seems, as long as the first document contains the metadata about the aggregate. |
@reednj I see a problem with your example as it does not give URLs in the source, only nicknames. In reality, you would need the URL. |
@reednj I am not familiar with yaml. How can you do namespaces in yaml? As far as I see, you would need namespacing for forwards compatibility. |
So sounds like commented YAML could be the way to go? I wonder if @buckket has an opinion? Also, please no namespaces, that is the very definition of YAGNI |
reednj could you explain how a format can be extensible if you do not have namespaces without basically ignoring everything in the file that is not in the default namespace? Or is the metadata format you envision a fixed format without any additional semantics, ever? |
Personally I would love to see twtxt either commit to a truly minimalist “no metadata” stance, or simply use Atom as the default format in a single file. Atom has everything you need. It is not the most terse file format; the existing twtxt format is the most terse if that’s what you’re shooting for. But as soon as we start trying to approximate feature-parity with Twitter, it’s likely we’ll just end up reinventing Atom/RSS poorly. Atom is human-readable, it’s a truly well-made and well-defined standard, there’s widespread support for it. |
You can have meta data about the user at the top of the file, without having any meta data about the messages, which is basically what I'm pushing for. I don't think we can or should or need to compete with twitter. The appeal of twtxt is its simplicity, and xml is the opposite of that in every way. |
I second @reednj.
The minimalist part here needs to stay. The fact that we can use only one (or two soon?) lines for each tweets make it simple and clear to use. |
I agree - we need user data for any sort of network propagation, but the messages themselves should remain as ephemeral and simple as they are currently. I think you hit the nail on the head. |
@mkody as I said, twtxt can be an input format for an already existing representation, like Markdown. Try http://news.dieweltistgarnichtso.net/bin/twtxt2atom out and you might see what I am proposing. @Benaiah what is “network propagation” ? |
So to be clear, official support for things like replies to chain messages together in conversations are absolutely off the table? If so, then that feels consistent and I can dig it. |
@erlehmann So you mean that we could keep the twtxt file and make an atom feed from it? |
I like the way @reednj posted! Advantages:
I really like atom and especially atom sync protocol, but twtxts simplicity and posting to your feed as simple as Everything we add with |
After thinking about this topic for a few days, I'm sure benaiah's first suggestion would be a very good fit for twtxt. If we just use comments like
somewhere in the file, it would be very easy even for the most simple client to read and write metadata in the feed. Whereas with things like yaml or ini you couldn't just read the file line by line and you probably need a parser to do the work. And this format would also allow the record who you once followed or your old twturl if somebody needs that. And for the argument about needing to parse the whole twtfile just to get the metadata: We currently are parsing the complete file every time to build the timeline so i'm not sure if this is even an issue. I have the strong feeling we should just use the easiest and most minimal solution one can think of. I mean, that's what twtxt is all about, right? :) |
mdom's suggestion sounds very reasonable. I also like the log style approach therein. |
We talked a little about it on irc, and we would also propose to add a timestamp to the comment, so the client can reorder metadata as it seems fit. Some would leave it interspersed in the file and others could move metadata to the top of the file. |
to still allow for simple sorting by timestamps, irc style commands could be an alternative to # comments:
|
Then tweets cannot start with a '/' (0x2F) character anymore. I don't think it's that much of a bother compared to what metadata storage can do, and I assume it's easier to parse than having to determine that the first character is a '#' and parse date and metadata altogether. He you can just parse things naturally using the existing methods, and if the first character of the message is a '/', then store that lline as metadata, not a tweet. |
Though i still prefer the lines starting with comments, this would be also a fine choice. It's a good point that you wouldn't have to add special syntax. But i wonder how often users want to start tweets with /me or path names and then you need some kind of escaping mechanism... :/ |
If this is the approach it would be better to use some uncommon unicode character (e.g. |
Maybe a vertical tab would work :P On Mon, Mar 7, 2016 at 1:57 PM -0800, "Joel Dueck" <notifications@github.commailto:notifications@github.com> wrote: If this is the approach it would be better to use some uncommon unicode character (e.g. ? or ?http://www.fileformat.info/info/unicode/char/261e/index.htm) instead of a slash. Reply to this email directly or view it on GitHubhttps://github.com//issues/48#issuecomment-193473667. |
Maybe we can use C99 oneline comment syntax. Using // would be visible distinctive, shouldn't be that common in normal tweets and it feels like a rather nice fit for a service for hackers. |
We could define one reserved word, as in:
|
If we take IRC, you cannot start your text with a slash, too. If we need the date of the action, putting it into a normal message and prefixing it with / will work (with the drawbacks mentioned). If we don't need the timestamp, there is no real reason to integrate it as some kind of special message. So we are at:
again ;). Since I really want to have metadata in the twtxt, to finish the persistent storage for https://web.twtxt.org - it would be good to have a decission on this. /cc @buckket |
I would really like to have a defined order of metadata. For example it would be really useful for follow/unfollow command, or you can define multiple twturls and the last should be used for fetching but the others urls could still be used for collapsing mentions etc. |
Most mature IRC clients have a way of sending something that starts with a slash to a channel, whether by making the user write two slashes, press control-enter, or write What about
vs.
to distinguish actions from posts? Namely, actions and metadata start with a space, while posts start with a tab. |
More ideas on For a belt-and-suspenders approach, one could do
That is, posts match "{}\t{}" whereas actions match "{} /{}" (in Python str.format() minilanguage) |
In the above comments are examples of lines to be parsed as ...
Looking at these, it seems we could/should identify metadata as 0, 2 or 5, with 5 being most strict? // edited to add 9 |
@archusr thanks for summarizing! I think (2) and (5) are good ways, too. I implemented (2) in https://web.twtxt.org (and changed my https://dracoblue.net/twtxt.txt accordingly) but it is not a big deal to change it to (5). @buckket what do you think? |
If we're leaning to option two or five, i would prefer 5 as we wouldn't have to code special cases to prevent /me from disappearing. I change txtnix accordingly. @quite, @DracoBlue would you change your clients too? Can maybe somebody with more python chops add it to twtxt and send a PR? |
@DracoBlue What do you like about 2 and 5 that you don't like about 0? Because it uses a space instead of a tab, there's no way for a user to accidentally make an action that was supposed to be a post — and I like that. |
Overloading of whitespace is fragile. Look at make. I would even argue, that twtxt shouldn't care what kind and what amount of whitespace is between timestamp and text. Think about all the editors that are autoconverting tabs to spaces. But that's probably an issue for another time... :) |
@mdom Yep!
would be more explicit. Actually 2+5 would be compatible to current clients. So we implement
In the alternative clients and somebody with python skills adds it with a PR to the official client? |
@mdom Makes sense. If you hate
then I'd suggest
because there's still no way to accidentally make an action. We could, of course, have one before-the-timestamp marker for actions and another before-the-timestamp marker for comments. |
would be great. Are we sure we want to standardize on 2 or 5 for the backwards-compatibility concerns of three clients and six users, all of which can probably be updated in two hours total? |
@adiabatic I'm a big fan of the |
Ok for me, too. Can somebody try how twtxt and current registries behave if |
Let's find out. I just updated my twtxt.txt with both version. |
http://twtxt.reednj.com/user/8c8d189d1c6f8810 Handles (0) like a normal "post". The others dont appear. roster, registry and twtxt-ui ignore all versions in your posts. Am Mittwoch, 23. März 2016 schrieb Mario Domgoergen :
|
twtxt dies with a stacktrace when parsing (1), but ignores (0) and (9). Seperating timstamp from metadata with a hash sign, seems to be ignored by all clients. And we could still allow any kind of ws for normal tweets. 👍 |
Ok. So: TIMESTAMP#action param1 is the final version? |
Shall we vote? Until when? (Wait for >50% of 14 participants (=8) in this thread?) https://doodle.com/poll/gh27hhtixvbttvdp Result so far:
|
Let's vote in here with the emojis ;) |
I think 4 votes is clear! ;) |
txtnix and twtxt-roster both support the new syntax. |
I have a few questions here:
After giving it some thought, I’d rather stick with a very simple, yet robust concept:
This way we can strip all the unnecessary metadata by removing lines starting with Another reason why it might be good having metadata at the top without having to go through the entire file: HTTP Range Requests. If you want to check only the metadata, request only the first x bytes, where x is a number big enough to house all relevant information. Sorry for not responding sooner. |
On Thu, Mar 24, 2016 at 08:45:55AM -0700, Felix Bayer wrote:
I think prepending a time stamp makes things easier for twtxt clients as And if we decide to not add a timestamp and in five weeks we find a Maybe we can have an optional timestamp and in case it's missing we just
I don't think anyone propsed that. The discussion was if the /command
I always likes the
If we just append, we could remember the end of the last request and |
Just a wild idea for now to keep it simple and open:
i.e. parameter[optional date/timestamp] literal or datatype and value |
one vs two filesIf we want to put some meta data in an extra file: let's put most of the data in this extra file. Having
at the beginning, would allow us to reuse the ini style of twtxts config with its content: https://dracoblue.net/twtxt.meta:
Having additional The advantage of this approach is, that the range requests could really be applyable, since the meta head wouldn't change at all or that often. The information about followings and so on, would be nice to "display" a profile page (like in twtxt-ui) and to have a officially supported way store the information. timestamp for metaIf I can see in my timeline, at which time one of my followings started to follow somebody, it's quite nice ;). Having |
are there any conventions about this stuff yet? Or, just in general, any progress? |
A number of different issues and ideas have made clear the need for a place to specify metadata about a twtxt.txt feed. For instance, essentially every idea for notifications so far needs to know where the notifications should go (technical details vary based on the proposal). The question then is how to store metadata.
Discussion in #22 has suggested a general comment character, thus allowing clients to handle individually how the metadata would be stored. I suggest building on this, allowing for general comments, but make the following format specifically for metadata:
This echoes the .ini format of the twtxt config file, which I think gives it a nice consistency.
The other main suggestion for metadata is to have another file. I dislike this approach because it complicates the protocol, significantly increases how much twtxt has to hit the network, and requires either a second URL for each person (for the metadata file), switching twtxt.txt to hold metadata and having another file hold the feed, or putting a metadata entry in twtxt.txt that points to the metadata file.
The text was updated successfully, but these errors were encountered: