-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC 34: Automatic redirects #34
base: main
Are you sure you want to change the base?
Conversation
The "create redirects for all subpages as well" option on moving a page seems like it could generate a mountain of redirects. Could we instead consider changing the default behaviour of redirects to be more like nginx, so that we do a prefix match on the source path and preserve anything after that prefix in the destination URL? That way, redirects on subpage URLs will Just Work with no additional records. |
Thanks @gasman. I was actually thinking of adding that in, but this raises some more questions: 1) If one of the sub-pages that have been moved is then later moved again, or deleted, how do we update the redirect? Say we move a section
This would mean that a requests to This isn't a problem with the "mountain of redirects approach", as the redirects will update automatically because they are pointed directly at the individual page objects. 2) How do we know that something exists at the other end of the redirect? Say, one of the subpages is later deleted but not replaced with a new redirect. The prefix redirects will redirect them to a 404 page, is this bad practice? One solution to this could be to find the page that the request is being redirected to, so we can check it exists and is live. If not, serve up a 404. 3) How do we make it easy for users to find a redirect in the admin if there are prefix redirects as well? I think we would need to add some logic in to the search so that a search for 4) If there is a page at the root of the prefix redirect, should we still fire it for sub paths? Say we have a prefix redirect from 5) What about deleting an entire sub-section? Should we also support prefix-redirects that can be configured to redirect every sub path to a single page? Mountains of redirects I know the "mountain of redirects" idea sounds a bit heavy, but I think it does have a few advantages:
|
@gasman @chosak @Scotchester @rosskarchner I noticed that you were all very quick to show your support for prefix matching. Please could you let me know if you have any thoughts on the issues I raised with it? |
I think leaving a chain of redirects in place is acceptable enough. For it to make any noticeable difference (as opposed to just adding a redundant ~100ms round trip), we'd be getting into the realm of pathological artificial scenarios - on a real world site we might expect a page to move once a year or thereabouts, and in the case a several-year-old URL, the fact that we have a functional redirect at all (even if it's less direct than it could be) means we're doing better than most sites - and, of course, better than the Wagtail status quo... If we do decide that redirect chains are a problem that needs to be addressed, then we can handle them by following the chain server-side, as you propose. By contrast, creating individual redirects for each sub-path is something that could easily kill a server in a real-world setup, given a sufficiently careless editor... imagine a site with 100k+ news articles, and an editor changing the slug of the news index. To me, creating an unbounded number of records as a result of a single user action is a 'code smell' - for example, that's why page permissions propagate down the tree 'logically' rather than creating permission records for every descendant (which is what something like django-guardian would do).
As above - I'm not aware of any negative consequences to having a redirect chain ending in a 404 (other than unrealistic pathological cases), and if there are any, we can follow the chain server-side.
Yes. Any time we have to answer the question "what redirects exist at path X?" - whether in the redirect middeware itself, the search interface, or the proposed new UI - we have to look for prefixes as well as exact matches.
Why would they need any special treatment? Redirect chains can exist in the current implementation, if the user explicitly sets them up through URL-based redirects (i.e.
Yes. The logic would be: If we arrive at the redirect middleware having hit a 404 for the current path, look for redirects whose old_path is a prefix of the current path, and take the longest (most specific) one. Then form a destination URL consisting of the redirect's destination plus the remainder of the current path. We never need to consider whether redirect.old_path is itself a 404, and I can't see any advantage in explicitly checking that. This does mean we have to revisit the notion that a 'masked' redirect (one with an old_path that is not a 404) is functionally useless. I don't think anything in Wagtail relies on that assumption though - as far as I'm aware, this RFC is the first time that anyone has explicitly made that observation :-)
I'd say that's a nice-to-have, but not essential for an MVP. |
Thanks for the response! Ok, I definitely agree that creating 100k redirects wouldn't be ideal! I must've been thinking on a smaller scale. I think that case would be very rare though, but I wouldn't want to suggest switching between the two based on the number of pages being moved. So happy to just go with prefix redirects. We should think carefully about the logic and UI though. I think redirect chains are going to become much more common after we start automatically generating prefix redirects. Another issue that I've just noticed, is prefix redirects can easily be used to create infinite loops. If someone renamed a section from
If someone then visits Even then, detecting loops might not be so easy. As you could create a redirect loop that requests a different path each time:
Which would redirect like:
I can't think of a nice solution to this one. Limiting the number of redirects maybe? |
To prevent redirect loops, I think we would need to resolve the full redirect chain on the server. And while doing that, we should keep track of the prefix redirects that have already been followed and make it a rule that prefix redirects will never be followed twice. So in the above example:
If a user requests
Does that sound OK? |
Another edge case: it's possible to create two prefix redirects for the same URL:
If a user requested I think in cases like these, the logic should always choose the most specific redirect so |
What if the redirect is going to a URL that's outside of Wagtail/Django's knowledge - for example, the public-facing hostname is a proxy that forwards to several different apps, only one of which is the Wagtail site? In that situation, we could end up rejecting a valid redirect because Wagtail's incomplete knowledge of the URL scheme would have it going to a 404 or a redirect loop. I think it wouldn't be the end of the world if we just let redirect loops happen - current browsers will spot them (albeit possibly only after a large number of HTTP hits...) and Wagtail doesn't promise to protect sites from all effects of admin-user errors (aka "garbage in, garbage out" :-) ) Maybe we can detect and warn about potential redirect chains/loops in the admin interface, if there's a good heuristic for spotting them (e.g. the destination of one redirect is a prefix or extension of the source of another redirect).
Yes. |
For another time I think
43f946b
to
1c1c418
Compare
I think this would be hard to implement
Updated RFC. |
text/034-redirects-admin-changes.md
Outdated
@@ -54,10 +54,6 @@ If this is checked when the page is published with the new slug, a redirect to t | |||
|
|||
When unpublishing or deleting a page, the user can add a redirect from that page’s URL to another page. If they create one, they will be given the option to repoint existing redirects at the new page, delete the existing redirects, or (when not deleting) do nothing. | |||
|
|||
#### Deleting redirects when pages are created in their place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding this to a "Not covered" section with reasons as to why not?
This way 1 year down the line we'd have a good reference.
It was mainly the fear of having hundreds of redirects created at once if a large section of the site was moved, but I think the ensuing discussion has done a good job addressing that :) |
Should we add a setting to enable resolving redirect chains in the backend? We won't do this by default in case the site has some custom routing that Wagtail isn't aware of which might lead to a redirect being seen as a chain when it is not. Most sites won't have this problem so we could allow this behaviour to be enabled with a setting (eg, |
text/034-redirects-admin-changes.md
Outdated
|
||
To allow creating redirects of this type, a new "Match prefix" checkbox/boolean field will be added to redirects. | ||
|
||
#### Choosing the most specific redirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other issue is if a page is moved from a
to b
then back to a
. We will end up with two prefix redirects a
=> a
and b
=> a
We probably want to exclude prefix redirects which point to a page that lives at the source path, as this would turn all sub paths into redirect loops.
👍 super useful feature - Haven't followed this conversation, just wanted to add that we had this talk in django-wiki/django-wiki#154 on the same issue some years back. In the end, we settled on a similar model in django-wiki/django-wiki#640 -- and I think for the current RFC to be even better, you might want to scroll up and down that PR. I'd suggest maybe adding "permanent redirect" for clarity in some of the user-facing examples and the technical RFC specs -- it's important for webmasters to know that search engines will update their links. An overview of redirects with created date and number of hits would make it easy to clean up later in life.. but that's just icing :) |
text/034-redirects-admin-changes.md
Outdated
|
||
When a page’s slug is changed, a checkbox will appear underneath the slug field asking the user if they would like to create a redirect from the old slug. | ||
|
||
If this is checked when the page is published with the new slug, a redirect to the page will be created from the previous URL of the page. If the page has any children, this redirect will be a prefix redirect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a slug change and move functionality are essentially the same and should be merged? Such that a once published page doesn't have a text input field for the slug but a Move button?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me like a reasonable solution to the question at line 51
Q: What if the page is saved as draft? The redirect cannot be created yet but the user who publishes it may not be aware that there is a checkbox on the promote tab.
text/034-redirects-admin-changes.md
Outdated
|
||
When a user then requests, ``/articles/the-article/`` this will be matched by the prefix redirect and the user will be redirected to ``/news/the-article/``. | ||
|
||
To allow creating redirects of this type, a new "Match prefix" checkbox/boolean field will be added to redirects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we achieve this by simply allowing the user to add an asterisk to the end of the path? We could use a regex validator to ensure they aren't added elsewhere, and we can set a flag on save if we need a convenient way to tell the two types apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the current RFC text. Wagtail assumes that a redirect always implies the full nested hierarchy, i.e. *
. So if one moves article/
to news/
then nothing can be nested in article/
anymore, since any request for article/<slug>
redirects to news/<slug>
... if I understand it correctly? This way users won't be burdened with exceptions, priorities and the whole idea of URL resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaoming My point refers specifically to how this could be handled in the UI... so, instead of having a separate "Match prefix" checkbox/boolean, the user would simply add an asterisk to the end of the path. They could do the same thing for the target path too (or leave it off to direct all matches to a specific path).
Wagtail assumes that a redirect always implies the full nested hierarchy, i.e. *
Not always... only if the "Match prefix" checkbox/boolean field
So if one moves article/ to news/ then nothing can be nested in article/ anymore, since any request for article/ redirects to news/... if I understand it correctly?
Redirects are only queried if no page can be found matching the requested URL... so, if you added a page to a location covered by a previous redirect, the page would be served at that URL (no redirection would take place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the asterisk option is likely to confuse people, especially non-technical users. I think it might be best to explicit write out the behavior of each option, maybe in a radio select. Maybe something along these lines:
- Redirect any url prefixed with this slug to the new slug (
/old/foo
→/new
), e.g. if deleting a whole tree - Redirect "children" of this slug to be children at the new slug (
/old/foo
→/new/foo
), e.g. if moving a whole tree
text/034-redirects-admin-changes.md
Outdated
|
||
![](https://d2mxuefqeaa7sj.cloudfront.net/s_C6D086527C63F45E9EA73587C2533A89CBE7313C89FA1DC0882B73424CAB09BB_1552329309392_Screenshot_2019-03-11+Wagtail+-+Exploring+Welcome+to+the+Wagtail+Bakery+1.png) | ||
|
||
We will add a “redirects” icon at the top right of the page explorer. This will indicate the number of redirects where their “from path” is from within this section. Clicking this will link the user to the redirect management which would be filtered to show only redirects from this section. This button will only be visible to users who have permission to manage redirects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could alternatively show the redirects in the page list without actually making them a page type. I think that it would be a bit more intuitive (since redirects do feel a little like pages) and make it less likely that people would forget that they have lots of redirect clutter. I might show them like pages but with a much smaller height, to indicate that they are not actually pages (and so that they don't entirely clutter the page list). And then maybe have a toggle for whether they're shown in the list.
In the 28 Oct core team meeting, we discussed the possibility of adding themotleyfool/wagtail-automatic-redirects to Wagtail core. I think that people's concerns in that meeting were generally in line with this RFC. A few considerations that we could bring into this: Adding back the section about automatic deletion of redirects when a new page is made at that location.This was mentioned in the core team meeting as a desirable feature, and seems like what a user would intuitively expect to have happen. @kaedroho, your comment on the commit removing this section said that sounded hard to implement. What challenges do you foresee? I imagine that we would check for conflicting redirects in the same was as checking whether there is already a page using the same slug. Making sure there is an easy way to turn off automatic creation of redirects@kaedroho suggests the option of removing the |
Thanks for the summary @nmorduch.
This approach sounds a little overzealous when adding flag/setting for this particular feature would be incredibly simple. |
Worth flagging up that we'd want to avoid making |
I don't think we should add settings for things that are purely UI preferences. If there's a technical reason to disallow redirects being created, removing the app from
One problem with signals/hooks is we have to support them like external APIs, even if they are only supposed to be used internally. This can make refactoring code harder later so I think we should be careful about where and why we add them. Could we treat hooks that aren't very useful outside Wagtail differently, so it's easy for us to change/remove them later?
I can't think of any past cases that required a custom redirects model/module. What would the benefit be for making this extensible? |
I agree entirely. I was referring specifically to disabling the automatic creation of redirects. I think it would be shortsighted to assume that all projects would want that turned on - regardless of how well it is designed.
Having gone 5+ years without this feature, it's reasonable to expect there will be plenty of 'own-rolled' solutions out there that might not be simple to strip out, or are built to custom requirements that the Wagtail-native version won't quite cater for. It's perfectly valid to want to keep using the redirects app in a project without that feature turned on (as all users of the app are currently doing so). By all means have the feature turned on by default. But, in the spirit of keeping Wagtail as unopinionated as possible, It would be nice to see it made optional. |
Ahh, yep that's a good point! Editors on existing sites wouldn't expect redirects to be created automatically as this didn't happen in the past, so we should make it optional so that existing workflows are unaffected. I think this option should be given to the editor though. Maybe we should add a checkbox under the "slug" field and in the move UI that's unchecked by default? This would let them know the feature exists, but they can ignore it if they want to.
We could say this about almost any new feature we add into Wagtail. There's almost always going to be some custom solutions out there that we need to take into account (this was the case for workflow and internationalisation too), but I don't think maintaining backwards compatibility for them is the highest priority for us. Hopefully, in most cases, they would switch over to Wagtail core's implementation. For those that don't, there is usually some unofficial way to disable features in Wagtail (overriding the redirects module would be possible in this case, there's a good chance custom implementors have already done this or are not even using it). I know that settings are easy to add, but each new setting adds a little maintenance burden, adds a bit of bloat to our docs, and potentially makes it more difficult for us implement new features in the future. This might not be a problem for one feature, but these issues will accumulate over time if we aren't strict about when we add new settings.
I think the admin UI is a part of Wagtail we can be opinionated about! |
My current stance is still in favour of creating one redirect for each page that is moved rather than using prefix redirects. I think this is a much simpler solution and doesn't have difficult issues like redirect chains/loops that could cause performance issues (see #34 (comment)) I think the main argument against this seems to be having to create lots of redirects if someone moves a large section. But I think this is a type of performance issue that we already have in Wagtail (eg, moving/deleting lots of pages may require lots of cache invalidations, and search reindexing too!) and also this case is very rare. We could perhaps solve it more generally by running these kinds of tasks in batches in subsequent requests. Some advantages of this approach are:
|
I'm not in favour of giving a content editor the option to opt-out of automatic redirect creation. I'd like this feature to be a project level decision. It makes writing, maintaining and using the feature easier... Automatically created redirects can be managed via the redirects app. Maybe I'm missing something. Why would a content editor opt-out? |
Hey everyone, I have updated the text now to reflect the decision made in the recent team meeting to not use prefix redirects. It's still possible to go back to prefix redirects even after this has been implemented. |
As discussed with @ababic and @allcaps on the #automatic-redirects-working-group slack channel, I'm going to update this to remove the UI that allows editors to opt-out. Instead, we will use a project-level setting for this. I'll also include some other features discussed like a flag so we know which redirects were auto-generated, a filter to allow users to hide auto-generated redirects in the admin UI. Also, there was some discussion about showing the number of redirects that point to a page on the promote tab which I'll add here also. Let me know if I've missed anything there. Also, any thoughts on whether this setting should be on by default? |
1909a10
to
ca15c41
Compare
ca15c41
to
9fb26d7
Compare
Development has begun on this feature! 🎉 The first wave of development focuses on the backend implementation, and is split across three PRs:
Am due to have a catchup next week with other team members regarding the UI improvements, and there is further development time pencilled in for that work at a later time. Big thank you to The National Archives for sponsoring development of this feature so far. |
wagtail/wagtail#7809 was replaced by wagtail/wagtail#7827 and released in https://docs.wagtail.org/en/stable/releases/2.16.html#automatic-redirect-creation. So I suggest we merge this finally |
We discussed some of these ideas in a core team meeting recently, I've added a few more items for discussion.
Sorry for lack of screenshots at the moment, will add some more later.
Rendered
Please don't view the screenshots as designs. I really just wanted to show the existing interfaces to help illustrate the descriptions. Hoping someone with design skills would chip in on this (ping @wagtail/ux-design)