Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List Resources: improve null checks on dates #892

Closed
victorlin opened this issue Jun 5, 2024 · 6 comments · Fixed by #1073
Closed

List Resources: improve null checks on dates #892

victorlin opened this issue Jun 5, 2024 · 6 comments · Fixed by #1073

Comments

@victorlin
Copy link
Member

1

from #874 (comment)

function _snapshotSummary(dates: string[]) {
const d = [...dates].sort()
if (d.length < 1) throw new Error("Missing dates.")
const d1 = new Date(d.at( 0)!).getTime();
const d2 = new Date(d.at(-1)!).getTime();
const days = (d2 - d1)/1000/60/60/24;

@ivan-aksamentov's recommendation:

Why is this bang needed? the preceding line ensures that there is at lease one element in d.

Compiler is not smart enough to figure this out from this kind of the conditional. Connecting .length to .at() is tough for a type system.

What it can easily figure out is null checks. So a cleaner way would be to check and throw after non-fallible array access:

const t0 = d.at(0)?.getTime()
if(isNil(t0)) { // careful to not exclude the legit value `0`
    throw ...
}
// `t0` is guaranteed to be `number` in this branch

I would go further. Finding first and last value is a common enough "algorithm" that I would introduce a utility function:

export function firstLastOrThrow<T>(a: T[]): [T, T] {
    // TODO: use .at() or lodash head() and tail() along with null checks 
}

This would make the code very pretty and safe, with all dirty details hidden:

const [t0, t1] = firstLastOrThrow(dates) // guaranteed to return a pair of numbers or to  throw

2

from #874 (comment)

function _draw(ref, resource: Resource) {
// do nothing if resource has no dates
if (!resource.dates) return
/* Note that _page_ resizes by themselves will not result in this function
rerunning, which isn't great, but for a modal I think it's perfectly
acceptable */
const sortedDateStrings = [...resource.dates].sort();
const flatData = sortedDateStrings.map((version) => ({version, 'date': new Date(version)}));

const x = d3.scaleTime()
// presence of dates on resource has already been checked so this assertion is safe
.domain([flatData[0]!.date, new Date()]) // the domain extends to the present day

@ivan-aksamentov's recommendation:

The problem with comments like this:

// presence of dates on resource has already been checked so this assertion is safe

is that 3 months from now a fresh intern comes and inserts new code between the check and the assertion (not knowing that this assertion exists - it's hard to find even when looking for it specifically) and then 💥. Though, to be fair, it did not happen with js previously (due to lack of interns?)

Again, a small wrapper would make it safe and clean:

export function at<T>(a: T[], index: number): T {}

If you can find a fallback value, then something like this might work:

flatData[0]?.date ?? new Date()

This cannot fail and requires no hacks.

Continuing with the wrapper (though not directly applicable here sadly):

export function at<T>(a: T[], index: number, fallback?: T): T {}

Another option, although more involved, is to write a custom array type which always returns value or throws and never returns undefined. There might be libraries implementing non-empty arrays.

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Jun 5, 2024

To shorten a bit:

  • Try to narrow down the checks to null checks
  • Make wrappers for common operations to avoid scattering the checks everywhere (there might be libs for that)

(Sadly, compilers are not yet smart enough to take array length checks into account. The length is only known at compile time, so in some cases it is plain impossible to decide at build time).

@victorlin
Copy link
Member Author

@ivan-aksamentov sorry for the late reply. I'm revisiting this while doing some TypeScript conversion in Auspice.

Make wrappers for common operations to avoid scattering the checks everywhere

I tried at(flatData, 0).date and it doesn't feel right. Are we supposed to replace every array [] access with this function? That seems tedious.

I don't see how this avoids scattering checks. There would still be a potential error from that function. Example:

function at<T>(a: T[], index: number): T {
  const value = a[index];
  if (value === undefined) {
    throw new Error("Index out of bounds!");
  }
  return value;
}

let firstDate;
try{
  firstDate = at(flatData, 0).date;
} catch (e) {
  // TODO: handle this
}
const x = d3.scaleTime()
  .domain([firstDate, new Date()])
  // ...

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Nov 8, 2024

@victorlin I don't remember anything about it. Here is a current brain dump :)

Are we supposed to replace every array [] access with this function? That seems tedious.

The choice here is that you could either insist on using syntactic sugar bracket notation in an unsafe way and defeat the compiler checks or you could combine it with a check - runtime and compile-time. Then, if checks appear in many places, it makes sense to extract a function. That's how I ended up with my thesis.

You are doing the check in the original code anyways, it's just it is disconnected from the array access. Which is dangerous, because there can be any amount of code between the check and array access in the future. Someone can come and refactor them apart into different functions, etc.

Not every place needs this, so no, you won't replace every array access with this. In many places you'd be able to find a default, then arr[0]?.foo ?? defaultFoo would do. In some places undefined is fine and no check is needed (imagine if .domain() would accept infinite ranges where one or both boundaries can be undefined). There could be other situations.

You would not need to catch the exception. You don't catch it in the original code. What would you do in the TODO: handle this case?

There needs to be a distinction:

  • unexpected errors: programmer mistakes, states which are impossible to reach in a bug-free program. In these situations you need to crash as soon as possible so you don't ship a bug. In real programming language you'd use assertions. In js - you let an unhandled exception to propagate all the way. I like to make a dedicated exception type for that, so that you know to not handle it, or to handle specially (e.g. insist that users should report this as a bug if they see it).
  • expected errors: file not found, invalid user input, network down etc. - these need exceptions and graceful handling

Also, conceptually this is not about array accesses here. Here you want to find the first and the last date, and it's a well-known algorithm. Wrapped into a function it will look much better. Then the checks become preconditions to the algo, meaning assertions, the first type of errors. There will be much less questions and design decisions to make this way. Constant inline reimplementation of well-known algorithms, especially small ones like this, can be a constant source of headaches.

Pragmatically, if all that feels too tedious, then perhaps it makes sense to turn off this particular check in tsconfig. After all, it worked fine in js. This particular case won't make program much safer. So I am def being pedantic here :)

@victorlin
Copy link
Member Author

victorlin commented Nov 14, 2024

@ivan-aksamentov thanks! I like the framing around error handling instead of the vague "improve null checks".

This is what I came up with in #1073 for each of the examples in the issue description:

  1. 4200b79
  2. 8afb18d

And as for error handling, it's the first commit in that PR: d6741a5

If you have some time to review, it would be helpful in shaping how we use TypeScript in our codebases 🙏

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Nov 14, 2024

@victorlin I like it!

I think _snapshotSummary() can be further improved by splitting it into 2 parts:

function _snapshotSummary(dates: string[]) {
   const { first, last } = rangeBounds(dates)
   return humanizeDuration(last - first)
}

This makes the body self-documented - gives you 2 easily recognizable well-known algo steps, which can be extracted, named and tucked away into the lib/ folder, unburdening the app devs from this unnecessary confusing reading every time. They can likely be replaced with lodash and something like this: https://www.npmjs.com/package/humanize-duration. I am sure in 1/4 of century of js someone has already solved these 2 small problems :) Right now all this inline spaghetti just pollutes the domain logic.

rangeBounds() is basically finding min and max of an array - can be expressed as separate min() and max() or as a more efficient algo. Not that it's performance critical, but once we identified the algo, sorting just looks sloppy now. Could be reused probably - minmax is a useful algo.

humanizeDuration() - there's probably 50 packages for that. I would be to lazy to even start typing a home-grown implementation, especially inline.

If replaced with libraries - there's no code to maintain or type-check - this eliminates the need for this part of refactoring adventure.

The most type-safe code is no code

@victorlin
Copy link
Member Author

@ivan-aksamentov thanks for reviewing!

Right now all this inline spaghetti just pollutes the domain logic.

+1 for splitting into 2 parts especially with the verbosity of added error handling. I can look into that separately.

Each part is only ~5 lines of code that is working fine. I think easier to just maintain these in-house and not add extra dependencies. Even more so when considering a similar change in Auspice which is published as a library, where I think we should avoid unnecessary dependencies to improve the experience of using the package in another project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants