-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Transform strings #64
Conversation
Supports: trim, trimLeft, trimRight, lowercase, uppercase, enumcase
@willfarrell thank you, it's great! I was thinking a lot about how to call this keyword. "coerce" in JS means "change type" which is not what this keyword is doing, so I don't think we should use it - it would confuse people, particularly given that ... |
Maybe just "transform" is fine, it can even later evolve to support other types (with other transformations). trim, trimleft, trimright, lowercase, uppercase - should we maybe just use JS string method names, at least while mutations are nowhere near the spec. Still figuring out what enumcase does... |
No, the passed variable is updated but it is never passed back - nothing we can do about it, as JS does not support passing scalars by reference, it's not C. EDIT: ajv could, in theory, pass it back in some way, e.g. via some function property (I know what people think about it, performance be damned :) But it's not issue of this package anyway. |
Unless you need it, I'd leave until there is demand, and stay with more generic methods. I was thinking split could be useful, but that requires some syntax extension(s), so I'd leave it to the future as well.
I like the idea, maybe also postpone though, unless you have a use case where you need it. In this case the above features (html, markdown, etc) could be in a separate package - a plugin for a plugin... |
Totally agree,
The use case I'm building this for is; we have data scientist upload massive csv files. One column is units that we validate with
It's a super fridge case in my opinion, I added it in the readme. If it comes up we can address it then. We can make an issue for it, so it's caught. I've been reflecting on which transforms should be included, what their names should be, and how plugins could work. First, shorten the list to the most common cases: Second, allow an object of functions to be passed in as an option for the const leftpad = require('left-pad')
// TODO allow options for each keywords to be passed in - idea WIP
defineKeywords(new Ajv, ['sanitize'], {
sanitize: {
trimleft: function (value, parentSchema) {
return value.replace(/^[\s]+/, '')
},
'leftpad-zero': function(value, parentSchema) {
return leftpad(value,10, '0')
}
}
})
// jsonschema example: "sanitize": ["uppercase","split-comma"] This would allow complete customization and flexibility for those transforms that need some extra configuration. Since these will mostly be straight forward transformations, or a simple wrapper like After writing this; I think we should use
I'll update the PR to |
I'm in favour of "transform", as "sanitize" feels more narrow (also British people will not like the spelling ;) - these transformations potentially have wider applicability. Let's not rush into the decision about the extensions, let's close the base case first. I think the syntax decision should be made first, as instead of:
I would prefer
and
so an extended syntax would be to pass array of objects (so order can be managed) and within each object some sorting should be agreed and defined (definitely not the order of keys in JSON). Also in case of single object array can possibly be dropped entirely, like in the second example. If we adopt such syntax, then extensions will be a bit more generic as well. Also, please have a look at how extensions are passed to other keywords - e.g. dynamicDefaults. In any case, let's not overload this PR with any extensions and do simple thing first. "transform" for stings only with "trim", "trimLeft/Right", "toUpperCase", "toLowerCase" (will also make it easier to map to JS methods - they're just their names...). Any reason what is bad about it? |
"enumCase" - still looking... |
Ok, I understood now how "toEnumCase" :) works. I missed that enum "uniqueness" limitation and thought that any values are allowed. You also need to check that all "enum" values are strings - they can be anything and JS will automatically coerce, we don't need that, should just throw. |
If you dislike JS methods and camelCase then let's use hyphen-case, like in formats: "trim-right", "upper-case" etc. |
Took a look at Naming: "trim", "trimLeft/Right", "toUpperCase", "toLowerCase", "toEnumCase". I was on the fence on which way to go. JS method names work for me. I'll update.
Good catch! I feel we should ignore non-strings during to transform process, |
- tweaked option names to match js method names stye - added in fix for non-string types
lol
correct, also, given that the keyword definition limits it to strings, everything else will be not passed to this keyword, so you don't need to specifically check that the passed value is a string, you can trust ajv on that :). At least, not until we make it mutli-type (I can see some use cases, but let's see if there is demand). |
multi-type test included |
I was testing it out today, works well for nodeJS where one loads keywords in explicitly (as per the tests). However, when I compile with Compiling: const schema = require('../dist/json-schema')
const Ajv = require('ajv')
const pack = require('ajv-pack')
const ajv = new Ajv({
v5: true,
format: 'full',
coerceTypes: true,
allErrors: true,
useDefaults: true,
sourceCode: true
})
require('ajv-keywords')(ajv, ['transform'])
const validate = ajv.compile(schema)
const moduleCode = pack(ajv, validate)
// save to file to include else where Unit Test: it('should transform after being packed', function() {
var schema, data
data = {o: ' Object '};
schema = {type: 'object', properties: {o: {type: 'string', transform: ['trim', 'toLowerCase']}}};
var ajv = defineKeywords(new Ajv, 'transform')
var validate = ajv.compile(schema);
var moduleCode = ajvPack(ajv, validate); // line 24
var packedValidate = requireFromString(moduleCode)
packedValidate(data).should.equal(true)
data.should.deep.equal({o:'object'});
})
/*
TypeError: Cannot read property 'patterns' of undefined
at Object.generate_gen_validate [as gen_validate] (node_modules/ajv-pack/lib/dotjs/gen_validate.js:4:33)
at generate_gen_single (node_modules/ajv-pack/lib/dotjs/gen_single.js:6:19)
at module.exports (node_modules/ajv-pack/lib/pack_validate.js:9:14)
at Context.<anonymous> (spec/transform.spec.js:24:22)
*/ Edit: |
@willfarrell ajv-pack only supports inline (and probably macro) keywords. Rather than making this one inline - it won’t be extensible if you do it - it could be more valuable to figure out how to make ajv-pack support other keywords types. Sorry didn’t merge yet - really busy, will review and merge over weekend. |
Was looking into No worries on the merge delay, big PRs sometimes need a week to catch any loose ends and flush out the idea. I'll be available most of the weekend, so hopefully, I can reply quickly if you need to follow up. Excited to get this in. Thanks for all the help, and the time you take for this project is much appreciated. |
@willfarrell thank you. I only removed "schema" from the definition, it is only used for "validate" keywords. |
Hi guys, Is it an intended behaviour to apply transformation after validations and not before? const Ajv = require('ajv')
const ajv = new Ajv
require('ajv-keywords')(ajv, ['transform'])
const schema = {
type: 'object',
properties: {
name: {type: 'string', format: 'email', transform: ['trim', 'toLowerCase']}
}
}
const data = {name: ' FoO@BaR.CoM '}
ajv.validate(schema, data) // false
data // { name: ' FoO@BaR.CoM ' } To fix this the type: 'string', allOf: [
{transform: ['trim', 'toLowerCase']},
{format: 'email'}
] This trick was the workaround proposed for almost all of the issues opened about this topic. In my use cases i never had the need to transform values after validations but always before. Other things i noticed: The doc says the transformation is applied before validaton: The doc has a wrong key in the example |
The intended order of execution is indeed that transform happend before validation. I'm already using it in production, so it is working. However I can confirm in your specific usecase that it is failing to be executed at all. I would suggest opening this as a seperate issue for this. const Ajv = require('ajv')
const ajv = new Ajv
require('ajv-keywords')(ajv, ['transform'])
const schema = {
type: 'object',
properties: {
name: {
type: 'string',
allOf: [
{format: 'email'},
{transform: ['trim', 'toLowerCase']} // not executed
]
}
}
}
const data = {name: ' FoO@BaR.CoM '}
const valid = ajv.validate(schema, data)
console.log(valid, data) // false { name: ' FoO@BaR.CoM ' } The Good catch on the docs, we went through a few versions of naming conventions until we settled on the current names. See #69 for the update. |
In general I was always against allowing to manage keywords execution order, JSON schema spec goes further saying that the results should not be dependent on the order of execution. Ajv obviously allows extending the spec but also guarantees that allOf are executed in the array items order, so from the point of “explicit over implicit” principle it is not a workaround - it’s the correct usage, that highlights the fact that in this particular case the execution order is important and it’s visible in the schema, not only in the code that defines Ajv instance. If you care about other platforms at all, adding $comment in such places may help as well. Allowing to manage the execution order of keywords at the point where the keyword is defined would make already not portable schema (across languages) even less portable - you would have to make sure that all apps/services/UI clients that use this schema define keywords in the same way for it to work. With allOf it’s easier. I will think more about it :) |
Is there any work around stripping the html tags? |
@ruchimutneja None that I know of. We are tracking the addition of that feature at #66. It's something I'd like to build this year. But time will tell. To prevent the bloat of this package, we may have to create a plugin approach just for it. I don't know your specific case, but likely it's only one or maybe 2 fields that would require html sanitization. In this case, I would suggest you run your own custom transform script over the particular parameter on the output of avj. Hope this helps. |
@willfarrell, I have many rich text fields and need to strip tags before validating the data.
I tried this script just to transform my string to lowercase. will it be mutating the data? bcz my output is the same, its not getting converted to lower case. |
@ruchimutneja if you have many, it might now be as feasible. Do you have the latest version? try v3.2.0 or v3.4.0, there was a transform bug in v3.3.0
|
@ruchimutneja please see the docs on how to use this package - you are not adding keywords to the instance in your code sample. |
Did this, not working for me. Not sure what I am missing here. |
the issue is, I need to pass the data in array format, even a single string I had to pass as ["MixCase"] and changed schema accordingly. |
@ruchimutneja you need to review JSON schema docs, that's definitely not the right place to ask advice on JSON schema usage - please search/ask on StackOverflow. |
Re scalar vs array - it's covered in docs, strings are passed by value not by reference. |
Supports the modification of strings: trimming, lowercase, uppercase, enumcase
Closes:
Issues I ran into:
TODO:
coerce
,sanitize
,transform
, or othertrim
,trimleft
,trimright
,lowercase
,uppercase
,enumcase