Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Transform strings #64

Merged
merged 5 commits into from
Apr 28, 2018
Merged

Feature: Transform strings #64

merged 5 commits into from
Apr 28, 2018

Conversation

willfarrell
Copy link
Contributor

@willfarrell willfarrell commented Apr 16, 2018

Supports the modification of strings: trimming, lowercase, uppercase, enumcase

Closes:

Issues I ran into:

  • Unable to coerce when only a string was passed in. Is it even possible to coerce a stand-alone value?
      data = '  String  ';
      schema = {type: 'string', coerce: ['trim', 'lowercase']};
      ajv.validate(schema, data) .should.equal(true);
      data.should.deep.equal('string');  // false

TODO:

  • Signoff on keyword name coerce, sanitize, transform, or other
  • Signoff on option names trim, trimleft, trimright, lowercase, uppercase, enumcase
  • identify any other options that should be included at this time
    • strip html tags
    • strip markdown
  • Idea: options to extend with custom methods?
  • Update README
  • Improve code coverage

Supports: trim, trimLeft, trimRight, lowercase, uppercase, enumcase
@coveralls
Copy link

coveralls commented Apr 16, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 9416073 on willfarrell:feature/coerce into 847c781 on epoberezkin:master.

@willfarrell willfarrell changed the title Feature: Coerce strings (WIP) Feature: Coerce strings Apr 16, 2018
@epoberezkin
Copy link
Member

@willfarrell thank you, it's great! I was thinking a lot about how to call this keyword. "coerce" in JS means "change type" which is not what this keyword is doing, so I don't think we should use it - it would confuse people, particularly given that ...
My other versions were different combinations of "text"/"str(ing)" with "transform".
"transformString" is probably the most explicit but maybe a bit long.
What do you think? Do you have any better idea?

@epoberezkin
Copy link
Member

Maybe just "transform" is fine, it can even later evolve to support other types (with other transformations).

trim, trimleft, trimright, lowercase, uppercase - should we maybe just use JS string method names, at least while mutations are nowhere near the spec.

Still figuring out what enumcase does...

@epoberezkin
Copy link
Member

epoberezkin commented Apr 20, 2018

Unable to coerce when only a string was passed in. Is it even possible to coerce a stand-alone value?

No, the passed variable is updated but it is never passed back - nothing we can do about it, as JS does not support passing scalars by reference, it's not C.

EDIT: ajv could, in theory, pass it back in some way, e.g. via some function property (I know what people think about it, performance be damned :) But it's not issue of this package anyway.

@epoberezkin
Copy link
Member

identify any other options that should be included at this time
strip html tags
strip markdown

Unless you need it, I'd leave until there is demand, and stay with more generic methods. I was thinking split could be useful, but that requires some syntax extension(s), so I'd leave it to the future as well.

Idea: options to extend with custom methods?

I like the idea, maybe also postpone though, unless you have a use case where you need it. In this case the above features (html, markdown, etc) could be in a separate package - a plugin for a plugin...

@willfarrell
Copy link
Contributor Author

willfarrell commented Apr 20, 2018

Totally agree, coerce is the wrong word. The more I explain it to others and think about it, the more sanitize feels right due to it's focus on strings only right now. transform is also a good option, are you thinking a transform like "a,b,c," => ["a","b","c"] as something people might like? I'm in no rush to lock down a name, and have it as well thought out as possible.

Still figuring out what enumcase does...

The use case I'm building this for is; we have data scientist upload massive csv files. One column is units that we validate with enum and has a values like mg/L. One of our uploader's, a government agency, has all their data in UPPERCASE like MG/L which currently fails validation. So instead of having them find and replace every value by hand or us doing a custom pre-process, I had the idea the be able to sanitize MG/L to mg/L to meet our data integrity needs on the fly.

Unable to coerce when only a string was passed in. Is it even possible to coerce a stand-alone value?

It's a super fridge case in my opinion, I added it in the readme. If it comes up we can address it then. We can make an issue for it, so it's caught.

I've been reflecting on which transforms should be included, what their names should be, and how plugins could work.

First, shorten the list to the most common cases: trim, uppercase, lowercase, enumcase. These are pretty common names I feel (except enumcase), let's keep it simple for now.

Second, allow an object of functions to be passed in as an option for the keyword

const leftpad = require('left-pad')
// TODO allow options for each keywords to be passed in - idea WIP
defineKeywords(new Ajv, ['sanitize'], {
  sanitize: {
    trimleft: function (value, parentSchema) {
      return value.replace(/^[\s]+/, '')
    },
   'leftpad-zero': function(value, parentSchema) {
     return leftpad(value,10,  '0')
   }
  }
})
// jsonschema example: "sanitize": ["uppercase","split-comma"]

This would allow complete customization and flexibility for those transforms that need some extra configuration. Since these will mostly be straight forward transformations, or a simple wrapper like leftpad exampled above, I think we can avoid plugins for a plugin. Maybe save this for a second iteration.

After writing this; I think we should use sanitize for string -> string transforms and postpone creation of transform for any -> any due to it potentially becoming a complex a coerseType replacement.

   'split-comma': function(value, parentSchema) {
     return value.split(',').map(v=>v.trim()) // Due to the coerseType option, to be able to do this it might have to be after validation. needs to be more thoroughly thought out
   }

I'll update the PR to sanitize and shorten the list of allowed values. Once merged, we can create new issues for extending it to allow extensions / plugins / addons and another for transform to support split and what that might look like. Let them sit for a bit collecting ideas and interest.

@willfarrell willfarrell changed the title Feature: Coerce strings Feature: Sanitize strings Apr 20, 2018
@epoberezkin
Copy link
Member

Totally agree, coerce is the wrong word. The more I explain it to others and think about it, the more sanitize feels right due to it's focus on strings only right now. transform is also a good option, are you thinking a transform like "a,b,c," => ["a","b","c"] as something people might like? I'm in no rush to lock down a name, and have it as well thought out as possible.

I'm in favour of "transform", as "sanitize" feels more narrow (also British people will not like the spelling ;) - these transformations potentially have wider applicability.

Let's not rush into the decision about the extensions, let's close the base case first. I think the syntax decision should be made first, as instead of:

"transform": ["split-comma"]

I would prefer

transform: [{"split": ","}]

and

transform: {"split": {"pattern": "\s*,\s*"}}

so an extended syntax would be to pass array of objects (so order can be managed) and within each object some sorting should be agreed and defined (definitely not the order of keys in JSON). Also in case of single object array can possibly be dropped entirely, like in the second example.

If we adopt such syntax, then extensions will be a bit more generic as well. Also, please have a look at how extensions are passed to other keywords - e.g. dynamicDefaults.

In any case, let's not overload this PR with any extensions and do simple thing first.

"transform" for stings only with "trim", "trimLeft/Right", "toUpperCase", "toLowerCase" (will also make it easier to map to JS methods - they're just their names...). Any reason what is bad about it?

@epoberezkin
Copy link
Member

"enumCase" - still looking...

@epoberezkin
Copy link
Member

Ok, I understood now how "toEnumCase" :) works. I missed that enum "uniqueness" limitation and thought that any values are allowed. You also need to check that all "enum" values are strings - they can be anything and JS will automatically coerce, we don't need that, should just throw.

@epoberezkin
Copy link
Member

If you dislike JS methods and camelCase then let's use hyphen-case, like in formats: "trim-right", "upper-case" etc.

@willfarrell
Copy link
Contributor Author

transform it is then, don't want to piss of the British.

Took a look at dynamicDefaults, how it does options, works for me. I like how it keeps the options in the schema, clearly, well thought out.

Naming: "trim", "trimLeft/Right", "toUpperCase", "toLowerCase", "toEnumCase". I was on the fence on which way to go. JS method names work for me. I'll update.

You also need to check that all "enum" values are strings

Good catch! I feel we should ignore non-strings during to transform process, coerceTypes should handle any type change.

- tweaked option names to match js method names stye
- added in fix for non-string types
@epoberezkin
Copy link
Member

epoberezkin commented Apr 20, 2018

transform it is then, don't want to piss of the British.

lol

I feel we should ignore non-strings during to transform process, coerceTypes should handle any type change.

correct, also, given that the keyword definition limits it to strings, everything else will be not passed to this keyword, so you don't need to specifically check that the passed value is a string, you can trust ajv on that :). At least, not until we make it mutli-type (I can see some use cases, but let's see if there is demand).

@willfarrell
Copy link
Contributor Author

At least, not until we make it mutli-type (I can see some use cases, but let's see if there is demand).

multi-type test included

@willfarrell willfarrell changed the title Feature: Sanitize strings Feature: Transform strings Apr 20, 2018
@willfarrell
Copy link
Contributor Author

willfarrell commented Apr 25, 2018

I was testing it out today, works well for nodeJS where one loads keywords in explicitly (as per the tests). However, when I compile with ajv-pack for front end use I get ReferenceError: customRule0 is not definedwhen I running validate(data). Wondering if there is something else I need to do to this PR to allow it to work with avj-pack.

Compiling:

const schema = require('../dist/json-schema')

const Ajv = require('ajv')
const pack = require('ajv-pack')

const ajv = new Ajv({
  v5: true,
  format: 'full',
  coerceTypes: true,
  allErrors: true,
  useDefaults: true,
  sourceCode: true
})
require('ajv-keywords')(ajv, ['transform'])
const validate = ajv.compile(schema)
const moduleCode = pack(ajv, validate)
// save to file to include else where

Unit Test:

  it('should transform after being packed', function() {
    var schema, data
    data = {o: '  Object  '};
    schema = {type: 'object', properties: {o: {type: 'string', transform: ['trim', 'toLowerCase']}}};

    var ajv = defineKeywords(new Ajv, 'transform')
    var validate = ajv.compile(schema);
    var moduleCode = ajvPack(ajv, validate);  // line 24

    var packedValidate = requireFromString(moduleCode)
    packedValidate(data).should.equal(true)
    data.should.deep.equal({o:'object'});
  })
/*
TypeError: Cannot read property 'patterns' of undefined
      at Object.generate_gen_validate [as gen_validate] (node_modules/ajv-pack/lib/dotjs/gen_validate.js:4:33)
      at generate_gen_single (node_modules/ajv-pack/lib/dotjs/gen_single.js:6:19)
      at module.exports (node_modules/ajv-pack/lib/pack_validate.js:9:14)
      at Context.<anonymous> (spec/transform.spec.js:24:22)
*/

Edit:
I think I found it, I need a /keywords/dots/transform.jst file ... I'll get started, but I may need some help here.

@epoberezkin
Copy link
Member

@willfarrell ajv-pack only supports inline (and probably macro) keywords. Rather than making this one inline - it won’t be extensible if you do it - it could be more valuable to figure out how to make ajv-pack support other keywords types.

Sorry didn’t merge yet - really busy, will review and merge over weekend.

@willfarrell
Copy link
Contributor Author

willfarrell commented Apr 26, 2018

Was looking into ajv-pack and dotjs last night. I came to the same conclusion. Would be nice to have but I'm sure most can survive without till then. Let me know if I can help with the implementation. I'll update the readme to reflect this.

No worries on the merge delay, big PRs sometimes need a week to catch any loose ends and flush out the idea. I'll be available most of the weekend, so hopefully, I can reply quickly if you need to follow up. Excited to get this in. Thanks for all the help, and the time you take for this project is much appreciated.

@epoberezkin epoberezkin merged commit 9416073 into ajv-validator:master Apr 28, 2018
@epoberezkin
Copy link
Member

@willfarrell thank you. I only removed "schema" from the definition, it is only used for "validate" keywords.

@marqu3z
Copy link

marqu3z commented May 16, 2018

Hi guys,

Is it an intended behaviour to apply transformation after validations and not before?
Issue ajv-validator/ajv#741 has been closed but it is actually not solved.

const Ajv = require('ajv')
const ajv = new Ajv
require('ajv-keywords')(ajv, ['transform'])

const schema = {
  type: 'object',
  properties: {
    name: {type: 'string', format: 'email', transform: ['trim', 'toLowerCase']}
  }
}

const data = {name: ' FoO@BaR.CoM '}
ajv.validate(schema, data) // false
data // { name: ' FoO@BaR.CoM ' }

To fix this the allOf must be used to force the order of the schemas, and apply transformation before validation.

type: 'string', allOf: [
        {transform: ['trim', 'toLowerCase']},
        {format: 'email'}
      ]

This trick was the workaround proposed for almost all of the issues opened about this topic.

In my use cases i never had the need to transform values after validations but always before.
This does not mean it is always the right order for all use cases and it would end up being an opinionated decision on how to use this feature, but choosing to apply it after validation is an opinionated choice as well and it does not solve a common use case like the above one.

Other things i noticed:

The doc says the transformation is applied before validaton:
https://github.com/epoberezkin/ajv-keywords/pull/64/files#diff-04c6e90faac2675aa89e2176d2eec7d8R626

The doc has a wrong key in the example lowercase instead of toLowerCase
https://github.com/epoberezkin/ajv-keywords/pull/64/files#diff-04c6e90faac2675aa89e2176d2eec7d8R652

@willfarrell
Copy link
Contributor Author

The intended order of execution is indeed that transform happend before validation. I'm already using it in production, so it is working. However I can confirm in your specific usecase that it is failing to be executed at all. I would suggest opening this as a seperate issue for this.

const Ajv = require('ajv')
const ajv = new Ajv
require('ajv-keywords')(ajv, ['transform'])

const schema = {
  type: 'object',
  properties: {
    name: {
      type: 'string',
      allOf:  [
        {format: 'email'},
        {transform: ['trim', 'toLowerCase']} // not executed
      ]
    }
  }
}

const data = {name: ' FoO@BaR.CoM '}
const valid = ajv.validate(schema, data)
console.log(valid, data) // false { name: ' FoO@BaR.CoM ' }

The allOf workaround is not transform specific. I'll leave it to @epoberezkin to speak to the inner working, and why that may be the case.

Good catch on the docs, we went through a few versions of naming conventions until we settled on the current names. See #69 for the update.

@epoberezkin
Copy link
Member

In general I was always against allowing to manage keywords execution order, JSON schema spec goes further saying that the results should not be dependent on the order of execution.

Ajv obviously allows extending the spec but also guarantees that allOf are executed in the array items order, so from the point of “explicit over implicit” principle it is not a workaround - it’s the correct usage, that highlights the fact that in this particular case the execution order is important and it’s visible in the schema, not only in the code that defines Ajv instance. If you care about other platforms at all, adding $comment in such places may help as well.

Allowing to manage the execution order of keywords at the point where the keyword is defined would make already not portable schema (across languages) even less portable - you would have to make sure that all apps/services/UI clients that use this schema define keywords in the same way for it to work. With allOf it’s easier.

I will think more about it :)

@ruchimutneja
Copy link

identify any other options that should be included at this time
strip html tags
strip markdown

Unless you need it, I'd leave until there is demand, and stay with more generic methods. I was thinking split could be useful, but that requires some syntax extension(s), so I'd leave it to the future as well.

Idea: options to extend with custom methods?

I like the idea, maybe also postpone though, unless you have a use case where you need it. In this case the above features (html, markdown, etc) could be in a separate package - a plugin for a plugin...

Is there any work around stripping the html tags?

@willfarrell
Copy link
Contributor Author

@ruchimutneja None that I know of. We are tracking the addition of that feature at #66. It's something I'd like to build this year. But time will tell. To prevent the bloat of this package, we may have to create a plugin approach just for it.

I don't know your specific case, but likely it's only one or maybe 2 fields that would require html sanitization. In this case, I would suggest you run your own custom transform script over the particular parameter on the output of avj.

Hope this helps.

@ruchimutneja
Copy link

ruchimutneja commented Feb 13, 2019

@willfarrell, I have many rich text fields and need to strip tags before validating the data.

const ajv = new Ajv();
const schema = {
    type: 'array',
    items: {
      type:'string',
      transform:['toLowerCase']
    }
  };
const data = ['  MixCase  '];
ajv.validate(schema, data);
console.log(data);

I tried this script just to transform my string to lowercase. will it be mutating the data? bcz my output is the same, its not getting converted to lower case.

@willfarrell
Copy link
Contributor Author

@ruchimutneja if you have many, it might now be as feasible.

Do you have the latest version? try v3.2.0 or v3.4.0, there was a transform bug in v3.3.0
Does npm test run successfully? I ask because a nearly exact test case exists for the code snippet you posted.

transform does mutate the data before validation.

@epoberezkin
Copy link
Member

epoberezkin commented Feb 13, 2019

@ruchimutneja please see the docs on how to use this package - you are not adding keywords to the instance in your code sample.

@ruchimutneja
Copy link

const Ajv = require('ajv');
const ajv = new Ajv;
require('ajv-keywords')(ajv, 'transform');
const schema = {
    type: 'string',
    transform: ['toLowerCase']
};
const data = 'MixCase';
const validate = ajv.compile(schema);
validate(data);
console.log(validate.errors);
console.log(data);

Did this, not working for me. Not sure what I am missing here.

@ruchimutneja
Copy link

the issue is, I need to pass the data in array format, even a single string I had to pass as ["MixCase"] and changed schema accordingly.

@epoberezkin
Copy link
Member

@ruchimutneja you need to review JSON schema docs, that's definitely not the right place to ask advice on JSON schema usage - please search/ask on StackOverflow.

@epoberezkin
Copy link
Member

Re scalar vs array - it's covered in docs, strings are passed by value not by reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants