-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing compose IDs is (practically) impossible if -
is allowed in product short name
#77
Comments
If I remember correctly, the assumption was that compose IDs should be opaque data and were not meant to be parsed. I can however see that there clearly are use cases where it's a relatively nice way to get some data. Fedora indeed does not have layered products. (A layered product generally is a non-operating system thing; the base product then is the operating system it's meant to run on.) Re Adding a rule to forbid dashes in short names would reduce the pain, and I think it can be done for new products. I'm not sure though if enforcing it at productmd level is the best idea (as opposed to doing it in tooling like Pungi to not allow creating new stuff that would be hard to work with). @dmach might have some more insight into this. |
"Re type_suffix: this is based on compose type (production, test, nightly etc.). There can never be a - character in it. For ga it's empty, in all other cases it's a . followed by one or two characters." Sorry, I was talking about the |
The release type is just a value that should describe the life cycle of the release. The possible values are listed at https://github.com/release-engineering/productmd/blob/master/productmd/common.py#L105-L112 If the value is As far as I know, Fedora is almost exclusively using For layered products this is a bit uglier: both the layered product and the base product can have types and they can be different (for example |
Is there any rule about having or not having a |
OK, it looks like neither
since the shortname can have both dashes and digits in it, we really can't reliably tell where the shortname ends and the release begins. My best plan so far is to do the parsing in reverse (just like the existing
so again we're stymied there, at least in theory. Do we actually have any release versions with dashes in them, though? Fedora only has 'Rawhide' and integers. If we don't have any existing products using a dash in their release version, I think we can make it possible to reliably parse the compose ID if we just change that regex and make it a rule that versions can't have a dash in them. That'd be a less disruptive change than disallowing dashes in shortnames, since Fedora is already using shortnames with dashes in. As a sidenote, btw, I'm not sure if that
|
I don't think there are any versions with dashes in them either. Originally the version was required to be numeric (with components separated by I think the regex works as intended, but that intention could be considered flawed: the version can be pretty much anything, but as long as it starts with a digit, it must follow the |
okay. so i think i'm gonna find a bit of time to see if I actually can write a reliable parser if we pretend that versions aren't allowed to have dashes in them, and if that's the case, it might be worth enforcing that rule. good to know the exception was only added for Rawhide, I wasn't sure about that. |
OK, so the best I could do is to write a parser which I think works if we make one change: make it so that versions can only be a 'normal' numerical-ish version (1, 26, 1.3.4.65.37, you know the drill) or 'Rawhide'. Or, at least, a specific list of strings, rather than allowing 'any string that doesn't start with a digit'. I don't think I can do any better than that - I don't think just enforcing 'no dashes in versions' is really enough, now I sat down and wrote it. The parser I wrote correctly parses all the compose IDs in fedfind's test suite, and also all the compose IDs in productmd's https://pagure.io/fedora-qa/fedfind/pull-request/7 I can adapt it as a productmd PR tomorrow, though, if you think it's worthwhile. |
I'm not sure if this is an issue or a cry for help...:)
Consider these compose IDs (one a real Fedora compose ID, one a made-up layered product compose ID from the test suite, since I don't know where to find a real RHEL layered product compose ID):
Fedora-Atomic-25-20170205.0
(product short name:Fedora-Atomic
)F-22-updates-BASE-3-updates-20160622.n.0
(product short name:F
)Now I read the code, all kinds of other subtleties are possible, which makes parsing compose IDs just super fun. But you could, I think, cope with many if not all of them by counting
-
characters. Maybe. I don't know. Life is awful.However, as long as
-
characters are allowed in the product short name, all bets are off, because it just becomes entirely impossible to figure out which-
s are being used as some kind of field separator by productmd, and which are just part of the product name (as in the first example).I've been doing this to parse compose IDs in Fedoraland:
The
rsplit
trick is to handle there being dashes in the shortname, and it relies on there being a known number of dashes after the shortname. But now I look at the test cases and code upstream, it's clear this is rather specific to my use cases and is relying on the fact that we don't have layered products (or at least, if we do, I don't care about any of 'em), or anytype_suffix
es (at least ones with dashes in them? I don't really know what these are), and I certainly don't care about RHEL 5 composes. But clearly someone does care about all those cases, so putting a generic compose ID parser in productmd (which is what I was trying to do) appears to be really rather futile, or at the least, would be very complex and have quite a bit of magic knowledge in it.So...I don't know what to do. Would it be worthwhile making it a rule that product short names can't have dashes in them, and changing Fedora's configurations appropriately (to use
FedoraAtomic
,FedoraCloud
andFedoraDocker
as the short names, or something)? Or should I just give up on generically parsing compose IDs and rejig everything I have that does it, somehow (probably by requiring a round trip to the compose location or to PDC to fetch the full compose metadata, unfortunately)? Currently we have various things that get useful information just by parsing the compose ID because that's the only thing they get from a fedmsg they parse, and it's nice to be able to avoid a round trip. Doing this kind of rejigging for fedfind would also be rather tricky architecturally, but that's more or less my problem and not yours...The text was updated successfully, but these errors were encountered: