Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.info() method in class Sequence #49

Open
8 tasks
VascoSch92 opened this issue Apr 14, 2024 · 7 comments
Open
8 tasks

.info() method in class Sequence #49

VascoSch92 opened this issue Apr 14, 2024 · 7 comments
Labels
enhancement improvements to existing features that do not introduce new functionality question Further information is requested

Comments

@VascoSch92
Copy link
Owner

VascoSch92 commented Apr 14, 2024

Add a .info() method to the class Sequence which print all the relevant information about the chose sequence.
Infornation to include:

  • Name of the sequence
  • Names of the classes implementing the sequence
  • First 5 (?) terms of the sequence
  • is finite? If yes also the length
  • Is periodic
  • Reference (?)
  • Is mononotonic increasing? Increasing? Constant?
  • Is strictly positive? Positive?
    ...

Open question:

  • what should be the format of the output?

API:

@property
def info(self) -> str:
    ...
@VascoSch92 VascoSch92 added the enhancement improvements to existing features that do not introduce new functionality label Apr 14, 2024
@VascoSch92 VascoSch92 changed the title Info method in Core .info() method in class Sequence Apr 14, 2024
@VascoSch92 VascoSch92 added the question Further information is requested label Apr 14, 2024
@fkiraly
Copy link

fkiraly commented Apr 14, 2024

If we were using scikit-base, as proposed in #45, most of these I'd model as tags, which you get as a dict via get_tags, and one can use for search and retrieval via all_objects.

e.g.,

class Fibonacci():

_tags = {
    "name": "Fibonacci Sequence",
    "OEIS": True,
    "OEIS_ID": "A000045",
    "is_finite": False,
    "is_periodic": False,
    "is_monotonic_increasing": True,
}

The benefit is, for a new property, a new tag is added easily.

I am also thinking about print_tags or similar, at least for sktime, for a version with explanation of each tag, and meta-tags or tag documentation, see this PR for a draft design covering the second half of this: sktime/sktime#6289

Things I would not make tags or methods:

  • references, I'd include them in docstring in a Reference section, and use sth like numpydoc/rst.
  • names of classes implementing - why would ther be multiple?
    • either way, related classes can be referenced in the docstring, unless we need to query it at runtime

@fkiraly
Copy link

fkiraly commented Apr 15, 2024

Taking inspiration from this discussion, here's an experimental design that allows tags to be accessed as attributes, in BaseObject descendants:

sktime/skbase#313

Reviews of the idea appreciated. This would allow sth like

fi = Fibonacci()

fi.is_periodic
# False

if the tags are set in the dict, as above.

The advantage above attributes, or properties:

  • the "tags" are localized in one place, separate from other attributes
  • they are also bundled in the code, in one place (while one could do that with attribs too, the dict enforces it)

Disadvantage is that it is more custom, and not entirely "python zen".

Not too strong arguments either way imo, opinions welcome (especially critical ones).

@VascoSch92
Copy link
Owner Author

VascoSch92 commented Apr 15, 2024

The benefit is, for a new property, a new tag is added easily.

I see your point. The problem is that I don't know if it is scalable. Let's say we want to delete the property is_finite. With the tag method I have to go through all the 50+ sequences that are already implemented and delete it from _tags. In the current setting. I can just delete 3 methods in 3 different classes and it is done. Same thing if we want to add a new property. I saw that tags are more adapt to a skpro setting where you use it as a label to classify/order statistical models.

names of classes implementing - why would ther be multiple?

The problem is that a sequence of number can have different names. For example, the sequence
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,...
is called the Fibonacci sequence, or the Fibonacci numbers, or the sequence A000045. Therefore, you can instantiate it using the classes A000045, FibonacciNumbers or FibonacciSequence. Same classes but different names.

@fkiraly
Copy link

fkiraly commented Apr 17, 2024

In the current setting. I can just delete 3 methods in 3 different classes and it is done. Same thing if we want to add a new property.

Can you explain that?

I agree that if there is some default logic that infers the property, it's probably not good to make it a tag.
Though, what is this being inferred from? Is being finite not an inherent property of the sequence?

In case you are worried about inheritance: the tag system in scikit-base does support inheritance. That is, if all descendants of a class have a parent where a tag is set, but do not set the tag themselves, then they will behave as if they had the ancestor's tag.

@fkiraly
Copy link

fkiraly commented Apr 17, 2024

Same classes but different names.

I would be extra cautious and would strongly recommend not to introduce different class names.

Yes, an object can have different names in literature, that's the same in stats/ML, e.g., Kriging, Gaussian process regression, kernel ridge regression with error variance fitting, all refer to the same algorithm (including parameters) up to renaming of symbols.

I think dealing with different names is more a matter of documentation and tagging, rather than creating multiple classes though, because classes live on the software architecture level, and proliferating references is anti-DRY.

Where I could understand different classes for the same mathematical object is OEIS vs Fibonacci, but here I would argue that the logical object differs, in parameterization. That is, OEIS("A00045") happens to be the same as Fibonacci mathematically, but it is so in different parameterization. OEIS is distinct from Fibonacci, as it parameterizes sequences by the OEIS register identifier.

@VascoSch92
Copy link
Owner Author

In case you are worried about inheritance: the tag system in scikit-base does support inheritance. That is, if all descendants of a class have a parent where a tag is set, but do not set the tag themselves, then they will behave as if they had the ancestor's tag.

Ah ok i didn't know. Is there a documentation for this class?

I would be extra cautious and would strongly recommend not to introduce different class names.

The idea was just to make the package more user-firendly.

@fkiraly
Copy link

fkiraly commented Apr 18, 2024

The idea was just to make the package more user-firendly.

I get that, but I would argue that proliferation of names makes it less user friendly actually.

See https://peps.python.org/pep-0020/

Violates "There should be one-- and preferably only one --obvious way to do it."

The "why" goes along the lines, users will be confused about which one to pick, and will start wondering whether these are indeed just aliases. There is also a higher risk of errors as you allow inconsistency of imports across different locations. For classes identity matters too, so either you have multiple distinct classes (which is bad as identical objects can arise from non-identical classes, how do you check instances), or all except one of the classes will point to one other class, which users will not expect, in which case one can easily argue for leaving just that one special class in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improvements to existing features that do not introduce new functionality question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants