Feature: Transform strings #64

willfarrell · 2018-04-16T05:24:15Z

Supports the modification of strings: trimming, lowercase, uppercase, enumcase

Closes:

trim, lowercase, ... ajv#705 - trim & lowercase
Flag to trim or not string field values before validating ajv#741 - trim
Is it support convert letter to upper-case? ajv#742 - uppercase
Idea: Coerce value case to match enum value case ajv#755 - enumcase

Issues I ran into:

Unable to coerce when only a string was passed in. Is it even possible to coerce a stand-alone value?

      data = '  String  ';
      schema = {type: 'string', coerce: ['trim', 'lowercase']};
      ajv.validate(schema, data) .should.equal(true);
      data.should.deep.equal('string');  // false

TODO:

Signoff on keyword name coerce, sanitize, transform, or other
Signoff on option names trim, trimleft, trimright, lowercase, uppercase, enumcase
identify any other options that should be included at this time
- strip html tags
- strip markdown
Idea: options to extend with custom methods?
Update README
Improve code coverage

Supports: trim, trimLeft, trimRight, lowercase, uppercase, enumcase

coveralls · 2018-04-16T05:25:26Z

Coverage remained the same at 100.0% when pulling 9416073 on willfarrell:feature/coerce into 847c781 on epoberezkin:master.

epoberezkin · 2018-04-20T13:52:05Z

@willfarrell thank you, it's great! I was thinking a lot about how to call this keyword. "coerce" in JS means "change type" which is not what this keyword is doing, so I don't think we should use it - it would confuse people, particularly given that ...
My other versions were different combinations of "text"/"str(ing)" with "transform".
"transformString" is probably the most explicit but maybe a bit long.
What do you think? Do you have any better idea?

epoberezkin · 2018-04-20T13:56:48Z

Maybe just "transform" is fine, it can even later evolve to support other types (with other transformations).

trim, trimleft, trimright, lowercase, uppercase - should we maybe just use JS string method names, at least while mutations are nowhere near the spec.

Still figuring out what enumcase does...

epoberezkin · 2018-04-20T13:58:03Z

Unable to coerce when only a string was passed in. Is it even possible to coerce a stand-alone value?

No, the passed variable is updated but it is never passed back - nothing we can do about it, as JS does not support passing scalars by reference, it's not C.

EDIT: ajv could, in theory, pass it back in some way, e.g. via some function property (I know what people think about it, performance be damned :) But it's not issue of this package anyway.

epoberezkin · 2018-04-20T14:03:46Z

identify any other options that should be included at this time
strip html tags
strip markdown

Unless you need it, I'd leave until there is demand, and stay with more generic methods. I was thinking split could be useful, but that requires some syntax extension(s), so I'd leave it to the future as well.

Idea: options to extend with custom methods?

I like the idea, maybe also postpone though, unless you have a use case where you need it. In this case the above features (html, markdown, etc) could be in a separate package - a plugin for a plugin...

willfarrell · 2018-04-20T15:33:45Z

Totally agree, coerce is the wrong word. The more I explain it to others and think about it, the more sanitize feels right due to it's focus on strings only right now. transform is also a good option, are you thinking a transform like "a,b,c," => ["a","b","c"] as something people might like? I'm in no rush to lock down a name, and have it as well thought out as possible.

Still figuring out what enumcase does...

The use case I'm building this for is; we have data scientist upload massive csv files. One column is units that we validate with enum and has a values like mg/L. One of our uploader's, a government agency, has all their data in UPPERCASE like MG/L which currently fails validation. So instead of having them find and replace every value by hand or us doing a custom pre-process, I had the idea the be able to sanitize MG/L to mg/L to meet our data integrity needs on the fly.

Unable to coerce when only a string was passed in. Is it even possible to coerce a stand-alone value?

It's a super fridge case in my opinion, I added it in the readme. If it comes up we can address it then. We can make an issue for it, so it's caught.

I've been reflecting on which transforms should be included, what their names should be, and how plugins could work.

First, shorten the list to the most common cases: trim, uppercase, lowercase, enumcase. These are pretty common names I feel (except enumcase), let's keep it simple for now.

Second, allow an object of functions to be passed in as an option for the keyword

const leftpad = require('left-pad')
// TODO allow options for each keywords to be passed in - idea WIP
defineKeywords(new Ajv, ['sanitize'], {
  sanitize: {
    trimleft: function (value, parentSchema) {
      return value.replace(/^[\s]+/, '')
    },
   'leftpad-zero': function(value, parentSchema) {
     return leftpad(value,10,  '0')
   }
  }
})
// jsonschema example: "sanitize": ["uppercase","split-comma"]

This would allow complete customization and flexibility for those transforms that need some extra configuration. Since these will mostly be straight forward transformations, or a simple wrapper like leftpad exampled above, I think we can avoid plugins for a plugin. Maybe save this for a second iteration.

After writing this; I think we should use sanitize for string -> string transforms and postpone creation of transform for any -> any due to it potentially becoming a complex a coerseType replacement.

   'split-comma': function(value, parentSchema) {
     return value.split(',').map(v=>v.trim()) // Due to the coerseType option, to be able to do this it might have to be after validation. needs to be more thoroughly thought out
   }

I'll update the PR to sanitize and shorten the list of allowed values. Once merged, we can create new issues for extending it to allow extensions / plugins / addons and another for transform to support split and what that might look like. Let them sit for a bit collecting ideas and interest.

epoberezkin · 2018-04-20T15:58:46Z

Totally agree, coerce is the wrong word. The more I explain it to others and think about it, the more sanitize feels right due to it's focus on strings only right now. transform is also a good option, are you thinking a transform like "a,b,c," => ["a","b","c"] as something people might like? I'm in no rush to lock down a name, and have it as well thought out as possible.

I'm in favour of "transform", as "sanitize" feels more narrow (also British people will not like the spelling ;) - these transformations potentially have wider applicability.

Let's not rush into the decision about the extensions, let's close the base case first. I think the syntax decision should be made first, as instead of:

"transform": ["split-comma"]

I would prefer

transform: [{"split": ","}]

and

transform: {"split": {"pattern": "\s*,\s*"}}

so an extended syntax would be to pass array of objects (so order can be managed) and within each object some sorting should be agreed and defined (definitely not the order of keys in JSON). Also in case of single object array can possibly be dropped entirely, like in the second example.

If we adopt such syntax, then extensions will be a bit more generic as well. Also, please have a look at how extensions are passed to other keywords - e.g. dynamicDefaults.

In any case, let's not overload this PR with any extensions and do simple thing first.

"transform" for stings only with "trim", "trimLeft/Right", "toUpperCase", "toLowerCase" (will also make it easier to map to JS methods - they're just their names...). Any reason what is bad about it?

epoberezkin · 2018-04-20T15:59:01Z

"enumCase" - still looking...

epoberezkin · 2018-04-20T16:10:36Z

Ok, I understood now how "toEnumCase" :) works. I missed that enum "uniqueness" limitation and thought that any values are allowed. You also need to check that all "enum" values are strings - they can be anything and JS will automatically coerce, we don't need that, should just throw.

epoberezkin · 2018-04-20T16:14:59Z

If you dislike JS methods and camelCase then let's use hyphen-case, like in formats: "trim-right", "upper-case" etc.

willfarrell · 2018-04-20T16:49:04Z

transform it is then, don't want to piss of the British.

Took a look at dynamicDefaults, how it does options, works for me. I like how it keeps the options in the schema, clearly, well thought out.

Naming: "trim", "trimLeft/Right", "toUpperCase", "toLowerCase", "toEnumCase". I was on the fence on which way to go. JS method names work for me. I'll update.

You also need to check that all "enum" values are strings

Good catch! I feel we should ignore non-strings during to transform process, coerceTypes should handle any type change.

- tweaked option names to match js method names stye - added in fix for non-string types

epoberezkin · 2018-04-20T16:56:12Z

transform it is then, don't want to piss of the British.

lol

I feel we should ignore non-strings during to transform process, coerceTypes should handle any type change.

correct, also, given that the keyword definition limits it to strings, everything else will be not passed to this keyword, so you don't need to specifically check that the passed value is a string, you can trust ajv on that :). At least, not until we make it mutli-type (I can see some use cases, but let's see if there is demand).

willfarrell · 2018-04-20T16:57:13Z

At least, not until we make it mutli-type (I can see some use cases, but let's see if there is demand).

multi-type test included

willfarrell · 2018-04-25T23:04:35Z

I was testing it out today, works well for nodeJS where one loads keywords in explicitly (as per the tests). However, when I compile with ajv-pack for front end use I get ReferenceError: customRule0 is not definedwhen I running validate(data). Wondering if there is something else I need to do to this PR to allow it to work with avj-pack.

Compiling:

const schema = require('../dist/json-schema')

const Ajv = require('ajv')
const pack = require('ajv-pack')

const ajv = new Ajv({
  v5: true,
  format: 'full',
  coerceTypes: true,
  allErrors: true,
  useDefaults: true,
  sourceCode: true
})
require('ajv-keywords')(ajv, ['transform'])
const validate = ajv.compile(schema)
const moduleCode = pack(ajv, validate)
// save to file to include else where

Unit Test:

  it('should transform after being packed', function() {
    var schema, data
    data = {o: '  Object  '};
    schema = {type: 'object', properties: {o: {type: 'string', transform: ['trim', 'toLowerCase']}}};

    var ajv = defineKeywords(new Ajv, 'transform')
    var validate = ajv.compile(schema);
    var moduleCode = ajvPack(ajv, validate);  // line 24

    var packedValidate = requireFromString(moduleCode)
    packedValidate(data).should.equal(true)
    data.should.deep.equal({o:'object'});
  })
/*
TypeError: Cannot read property 'patterns' of undefined
      at Object.generate_gen_validate [as gen_validate] (node_modules/ajv-pack/lib/dotjs/gen_validate.js:4:33)
      at generate_gen_single (node_modules/ajv-pack/lib/dotjs/gen_single.js:6:19)
      at module.exports (node_modules/ajv-pack/lib/pack_validate.js:9:14)
      at Context.<anonymous> (spec/transform.spec.js:24:22)
*/

Edit:
I think I found it, I need a /keywords/dots/transform.jst file ... I'll get started, but I may need some help here.

epoberezkin · 2018-04-26T05:57:18Z

@willfarrell ajv-pack only supports inline (and probably macro) keywords. Rather than making this one inline - it won’t be extensible if you do it - it could be more valuable to figure out how to make ajv-pack support other keywords types.

Sorry didn’t merge yet - really busy, will review and merge over weekend.

willfarrell · 2018-04-26T11:57:24Z

Was looking into ajv-pack and dotjs last night. I came to the same conclusion. Would be nice to have but I'm sure most can survive without till then. Let me know if I can help with the implementation. I'll update the readme to reflect this.

No worries on the merge delay, big PRs sometimes need a week to catch any loose ends and flush out the idea. I'll be available most of the weekend, so hopefully, I can reply quickly if you need to follow up. Excited to get this in. Thanks for all the help, and the time you take for this project is much appreciated.

epoberezkin · 2018-04-28T09:17:16Z

@willfarrell thank you. I only removed "schema" from the definition, it is only used for "validate" keywords.

marqu3z · 2018-05-16T10:19:25Z

Hi guys,

Is it an intended behaviour to apply transformation after validations and not before?
Issue ajv-validator/ajv#741 has been closed but it is actually not solved.

const Ajv = require('ajv')
const ajv = new Ajv
require('ajv-keywords')(ajv, ['transform'])

const schema = {
  type: 'object',
  properties: {
    name: {type: 'string', format: 'email', transform: ['trim', 'toLowerCase']}
  }
}

const data = {name: ' FoO@BaR.CoM '}
ajv.validate(schema, data) // false
data // { name: ' FoO@BaR.CoM ' }

To fix this the allOf must be used to force the order of the schemas, and apply transformation before validation.

type: 'string', allOf: [
        {transform: ['trim', 'toLowerCase']},
        {format: 'email'}
      ]

This trick was the workaround proposed for almost all of the issues opened about this topic.

In my use cases i never had the need to transform values after validations but always before.
This does not mean it is always the right order for all use cases and it would end up being an opinionated decision on how to use this feature, but choosing to apply it after validation is an opinionated choice as well and it does not solve a common use case like the above one.

Other things i noticed:

The doc says the transformation is applied before validaton:
https://github.com/epoberezkin/ajv-keywords/pull/64/files#diff-04c6e90faac2675aa89e2176d2eec7d8R626

The doc has a wrong key in the example lowercase instead of toLowerCase
https://github.com/epoberezkin/ajv-keywords/pull/64/files#diff-04c6e90faac2675aa89e2176d2eec7d8R652

willfarrell · 2018-05-16T13:41:23Z

The intended order of execution is indeed that transform happend before validation. I'm already using it in production, so it is working. However I can confirm in your specific usecase that it is failing to be executed at all. I would suggest opening this as a seperate issue for this.

const Ajv = require('ajv')
const ajv = new Ajv
require('ajv-keywords')(ajv, ['transform'])

const schema = {
  type: 'object',
  properties: {
    name: {
      type: 'string',
      allOf:  [
        {format: 'email'},
        {transform: ['trim', 'toLowerCase']} // not executed
      ]
    }
  }
}

const data = {name: ' FoO@BaR.CoM '}
const valid = ajv.validate(schema, data)
console.log(valid, data) // false { name: ' FoO@BaR.CoM ' }

The allOf workaround is not transform specific. I'll leave it to @epoberezkin to speak to the inner working, and why that may be the case.

Good catch on the docs, we went through a few versions of naming conventions until we settled on the current names. See #69 for the update.

epoberezkin · 2018-05-16T19:03:04Z

In general I was always against allowing to manage keywords execution order, JSON schema spec goes further saying that the results should not be dependent on the order of execution.

Ajv obviously allows extending the spec but also guarantees that allOf are executed in the array items order, so from the point of “explicit over implicit” principle it is not a workaround - it’s the correct usage, that highlights the fact that in this particular case the execution order is important and it’s visible in the schema, not only in the code that defines Ajv instance. If you care about other platforms at all, adding $comment in such places may help as well.

Allowing to manage the execution order of keywords at the point where the keyword is defined would make already not portable schema (across languages) even less portable - you would have to make sure that all apps/services/UI clients that use this schema define keywords in the same way for it to work. With allOf it’s easier.

I will think more about it :)

ruchimutneja · 2019-02-12T15:28:21Z

identify any other options that should be included at this time
strip html tags
strip markdown

Unless you need it, I'd leave until there is demand, and stay with more generic methods. I was thinking split could be useful, but that requires some syntax extension(s), so I'd leave it to the future as well.

Idea: options to extend with custom methods?

I like the idea, maybe also postpone though, unless you have a use case where you need it. In this case the above features (html, markdown, etc) could be in a separate package - a plugin for a plugin...

Is there any work around stripping the html tags?

willfarrell · 2019-02-13T05:13:58Z

@ruchimutneja None that I know of. We are tracking the addition of that feature at #66. It's something I'd like to build this year. But time will tell. To prevent the bloat of this package, we may have to create a plugin approach just for it.

I don't know your specific case, but likely it's only one or maybe 2 fields that would require html sanitization. In this case, I would suggest you run your own custom transform script over the particular parameter on the output of avj.

Hope this helps.

ruchimutneja · 2019-02-13T06:43:43Z

@willfarrell, I have many rich text fields and need to strip tags before validating the data.

const ajv = new Ajv();
const schema = {
    type: 'array',
    items: {
      type:'string',
      transform:['toLowerCase']
    }
  };
const data = ['  MixCase  '];
ajv.validate(schema, data);
console.log(data);

I tried this script just to transform my string to lowercase. will it be mutating the data? bcz my output is the same, its not getting converted to lower case.

willfarrell · 2019-02-13T06:49:58Z

@ruchimutneja if you have many, it might now be as feasible.

Do you have the latest version? try v3.2.0 or v3.4.0, there was a transform bug in v3.3.0
Does npm test run successfully? I ask because a nearly exact test case exists for the code snippet you posted.

transform does mutate the data before validation.

epoberezkin · 2019-02-13T07:05:14Z

@ruchimutneja please see the docs on how to use this package - you are not adding keywords to the instance in your code sample.

ruchimutneja · 2019-02-13T08:00:08Z

const Ajv = require('ajv');
const ajv = new Ajv;
require('ajv-keywords')(ajv, 'transform');
const schema = {
    type: 'string',
    transform: ['toLowerCase']
};
const data = 'MixCase';
const validate = ajv.compile(schema);
validate(data);
console.log(validate.errors);
console.log(data);

Did this, not working for me. Not sure what I am missing here.

ruchimutneja · 2019-02-13T08:42:23Z

the issue is, I need to pass the data in array format, even a single string I had to pass as ["MixCase"] and changed schema accordingly.

epoberezkin · 2019-02-13T12:31:04Z

@ruchimutneja you need to review JSON schema docs, that's definitely not the right place to ask advice on JSON schema usage - please search/ask on StackOverflow.

epoberezkin · 2019-02-13T12:32:42Z

Re scalar vs array - it's covered in docs, strings are passed by value not by reference.

feat: add in new keyword coerce

bca7627

Supports: trim, trimLeft, trimRight, lowercase, uppercase, enumcase

willfarrell added 2 commits April 15, 2018 23:34

test: add test for compile errors

46ee25e

docs: write docs for coerce keyword

8d3fc30

willfarrell changed the title ~~Feature: Coerce strings (WIP)~~ Feature: Coerce strings Apr 16, 2018

willfarrell changed the title ~~Feature: Coerce strings~~ Feature: Sanitize strings Apr 20, 2018

refactor: change name to transform

604c8cf

- tweaked option names to match js method names stye - added in fix for non-string types

willfarrell changed the title ~~Feature: Sanitize strings~~ Feature: Transform strings Apr 20, 2018

This was referenced Apr 23, 2018

transform Addtions (WIP) #66

Open

Update ajv-keywords datastreamapp/schema#6

Closed

willfarrell mentioned this pull request Apr 25, 2018

transform when packed causes ReferenceError: customRule0 is not defined datastreamapp/schema#8

Closed

docs: update readme to reflect ajv-pack usage

9416073

epoberezkin merged commit 9416073 into ajv-validator:master Apr 28, 2018

willfarrell mentioned this pull request Apr 30, 2018

feat: add in i18n support for validator middyjs/middy#161

Merged

This was referenced Apr 30, 2018

trim, lowercase, ... ajv-validator/ajv#705

Closed

Flag to trim or not string field values before validating ajv-validator/ajv#741

Closed

Is it support convert letter to upper-case? ajv-validator/ajv#742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Transform strings #64

Feature: Transform strings #64

willfarrell commented Apr 16, 2018 •

edited

Loading

coveralls commented Apr 16, 2018 •

edited

Loading

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018 •

edited

Loading

epoberezkin commented Apr 20, 2018

willfarrell commented Apr 20, 2018 •

edited

Loading

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

willfarrell commented Apr 20, 2018

epoberezkin commented Apr 20, 2018 •

edited

Loading

willfarrell commented Apr 20, 2018

willfarrell commented Apr 25, 2018 •

edited

Loading

epoberezkin commented Apr 26, 2018

willfarrell commented Apr 26, 2018 •

edited

Loading

epoberezkin commented Apr 28, 2018

marqu3z commented May 16, 2018 •

edited

Loading

willfarrell commented May 16, 2018

epoberezkin commented May 16, 2018

ruchimutneja commented Feb 12, 2019

willfarrell commented Feb 13, 2019

ruchimutneja commented Feb 13, 2019 •

edited

Loading

willfarrell commented Feb 13, 2019

epoberezkin commented Feb 13, 2019 •

edited

Loading

ruchimutneja commented Feb 13, 2019

ruchimutneja commented Feb 13, 2019

epoberezkin commented Feb 13, 2019

epoberezkin commented Feb 13, 2019

Feature: Transform strings #64

Feature: Transform strings #64

Conversation

willfarrell commented Apr 16, 2018 • edited Loading

coveralls commented Apr 16, 2018 • edited Loading

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018 • edited Loading

epoberezkin commented Apr 20, 2018

willfarrell commented Apr 20, 2018 • edited Loading

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

epoberezkin commented Apr 20, 2018

willfarrell commented Apr 20, 2018

epoberezkin commented Apr 20, 2018 • edited Loading

willfarrell commented Apr 20, 2018

willfarrell commented Apr 25, 2018 • edited Loading

epoberezkin commented Apr 26, 2018

willfarrell commented Apr 26, 2018 • edited Loading

epoberezkin commented Apr 28, 2018

marqu3z commented May 16, 2018 • edited Loading

willfarrell commented May 16, 2018

epoberezkin commented May 16, 2018

ruchimutneja commented Feb 12, 2019

willfarrell commented Feb 13, 2019

ruchimutneja commented Feb 13, 2019 • edited Loading

willfarrell commented Feb 13, 2019

epoberezkin commented Feb 13, 2019 • edited Loading

ruchimutneja commented Feb 13, 2019

ruchimutneja commented Feb 13, 2019

epoberezkin commented Feb 13, 2019

epoberezkin commented Feb 13, 2019

willfarrell commented Apr 16, 2018 •

edited

Loading

coveralls commented Apr 16, 2018 •

edited

Loading

epoberezkin commented Apr 20, 2018 •

edited

Loading

willfarrell commented Apr 20, 2018 •

edited

Loading

epoberezkin commented Apr 20, 2018 •

edited

Loading

willfarrell commented Apr 25, 2018 •

edited

Loading

willfarrell commented Apr 26, 2018 •

edited

Loading

marqu3z commented May 16, 2018 •

edited

Loading

ruchimutneja commented Feb 13, 2019 •

edited

Loading

epoberezkin commented Feb 13, 2019 •

edited

Loading