Add workarounds for anti-framing scripts #636

Mr0grog · 2020-11-11T03:47:16Z

Some pages have anti-framing/anti-clickjacking code that checks whether the page is in a frame and hides the content and/or attempts to redirect the top frame to the page. For example, the https://www.census.gov/programs-surveys/economic-census.html has this code in the <head>:

<head>
  ...
  <style id="antiClickjack">body { display: none; }</style>
  <script type="text/javascript">
    if (self === top) {
      var antiClickjack =  document.getElementById("antiClickjack");
      antiClickjack.parentNode.removeChild(antiClickjack);
    } else {
      top.location = self.location
    }
  </script>
  ...
</head>

Since we show pages in iframes, this is a problem. We set restrictions on the frame’s code so it can’t redirect the top frame, but this still leaves us with a blank page (and in some cases, a broken page because the script might throw an exception). Some workaround ideas:

Inject a script that runs after page load (or maybe just at the end of the page?) checks whether the html or body element’s computed style has display: none or visibility: hidden. If so, explicitly set the elements’ style to display: block; visibility: visible;. Something like:
```
[document.documentElement, document.body].forEach(function ensureVisible(element) {
    style = getComputedStyle(element);
    // Check and set these in one go because setting one and then checking
    // the next will cause layout thrashing.
    if (style.display === 'none' || style.visibility === 'hidden') {
        element.style.display = 'block';
        element.style.visibility = 'visible';
    }
});
```
Some downsides: won’t fix scripts that errored out, won’t work if the thing being hidden is some arbitrary wrapper element in the page (although we could maybe come up with some heuristics for that).

Wrap any scripts on the page in a with block that acts as a proxy for the window. For example, we’d transform the above example from census.gov to:

<head>
  ...
  <style id="antiClickjack">body { display: none; }</style>

  <!-- Insert this element before the first <script> tag -->
  <script type="text/javascript">
    // Create a fake `window` object that makes `self` and `top` look identical.
    if (window.Proxy) {
      window.WINDOW_PROXY = new Proxy(window, {
        get (target, prop, receiver) {
          if (prop === "top" || prop === "self" || prop === "window") {
            return receiver;
          }
          return Reflect.get(target, prop, target);
        }
      });
    }
    else {
      window.WINDOW_PROXY = {self: window, top: window};
    }
  </script>

  <!-- Wrap the contents of any <script> tags in `with (WINDOW_PROXY) {...}` -->
  <script type="text/javascript">
    // Wrap the original contents of the script so properties are grabbed from
    // a special proxy object.
    with (WINDOW_PROXY) {
      if (self === top) {
        var antiClickjack =  document.getElementById("antiClickjack");
        antiClickjack.parentNode.removeChild(antiClickjack);
      } else {
        top.location = self.location
      }
    }
  </script>
  ...
</head>

Also not perfect: it only covers scripts that are in the page, rather than external references (i.e. <script src="some_url"></script>); the fallback version that doesn’t use Proxy could be error-prone in other ways (maybe just don’t support that case?). We could also expand this approach to solve some of the things that the iframe sandbox is causing errors with (e.g. referencing or setting document.cookie).

REALLY complex: add a service worker to essentially do the above to external scripts. This probably won’t work in a lot of cases (service workers don’t always apply) and may not really be worthwhile. It’s probably better accomplished by something even more messy: rewriting all [script] URLs so that the front-end server proxies them, and have it do this wrapping. On the other hand, proxying & rewriting (kinda like Wayback/the memento API) will solve lots of other issues, like CORS problems.
Any other ideas? These are the only two obvious approaches that jump out at me.

It might make the most sense to do a combination of the above. We could also push this into the HTML differ. instead of doing it here in the front-end.

The text was updated successfully, but these errors were encountered:

Mr0grog · 2020-11-11T03:59:45Z

Example view with this problem: https://monitoring.envirodatagov.org/page/1de9a11d-330b-4a87-9926-6c6357b6f668/7dd9970c-b2ad-4cbf-aaa2-98e915ee03b0..7f6fb147-fb5a-481d-8d82-f0018c2449fd

Mr0grog · 2021-02-17T20:21:48Z

Seems like this is also in use by USDA, too: https://monitoring.envirodatagov.org/page/70a43879-1f5a-4530-b174-f2a510aec11e/c97ac84e-d837-4126-9365-fae4a9a85cb6..95bf5ef1-61f3-47b8-8b85-f29146df86ae
(That’s a capture of https://www.nrcs.usda.gov/wps/portal/nrcs/main/national/about/history)

Mr0grog · 2021-02-17T20:23:42Z

FWIW, the “right” long-term solution is that we need to serve pages and diffs through a proxy (which probably needs its own subdomain to be safe) that acts kind of like a Memento API, and maybe uses Wombat (from PyWB). That’s a lot of work, though, and I’m thinking of smaller improvements we can make here.

Mr0grog added the bug label Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workarounds for anti-framing scripts #636

Add workarounds for anti-framing scripts #636

Mr0grog commented Nov 11, 2020 •

edited

Loading

Mr0grog commented Nov 11, 2020

Mr0grog commented Feb 17, 2021

Mr0grog commented Feb 17, 2021

Add workarounds for anti-framing scripts #636

Add workarounds for anti-framing scripts #636

Comments

Mr0grog commented Nov 11, 2020 • edited Loading

Mr0grog commented Nov 11, 2020

Mr0grog commented Feb 17, 2021

Mr0grog commented Feb 17, 2021

Mr0grog commented Nov 11, 2020 •

edited

Loading