-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO: sorting through populate fields (Ex: sort users through role.name) #316
Comments
While doing research to patch this problem I found an even bigger problem (I think) in the |
Hi @lsarrazi ! Thanks for your interest in the project! There is a task to add sorting through populated fields, but no work has been done on that yet. For the Just as an FYI, I'm currently working on a major update to the framework, so depending on the results of this task I might be able to add this as part of that update. Thanks! |
Hello @JKHeadley , this project is really awesome, I do believe. Glad to hear your still working on it :)
Is there something missing here or a detail you want to specify ? I also want to ask, whats for you the difference (pros/cons) of to output for [
{ // child model
association: { // linking model
relashionship: 'friend'
},
name: 'bob',
},
...
] versus: [
{ // linking model
child: { // child model
name: 'bob'
},
relashionship: 'friend'
},
...
] |
Thanks @lsarrazi ! All thoughts are greatly appreciated here. I have thought about this for a while, but haven't truly brainstormed it yet. I'll try to share some of my thoughts here with you, though please keep in mind this is not a thorough review of all aspects that need to be considered. When it comes to fetching nested associations, there are 3 major topics that come to mind:
If we were to properly handle nested associations, I think you are correct in assuming we need to handle all the same functionality/query parameters that a normal query supports. In effect, I think this means we would need to support objects in addition to strings as part of the
In the example above, the original This is just an example, but I think you can see how this could be further nested. This of course is only an example of a way to update the query format to accept more advanced embedding. On the implementation side I am not sure yet what the best approach will be, whether it's updating the current usage of the mongoose Assuming we are able to effectively implement these updates, I think this would solve the initial issue that you pointed out with the Now, this updated On a related note, we would need to update query validation to handle the new query parameter functionality based on the type of association being queried. For example, it should cause an error to include a
Similarly, attempting to sort via a ONE_MANY or MANY_MANY association should throw an error as well, Ex:
Lastly, I certainly haven't thought out all the implications these updates create for other features such as Authorization, but I think it's important to bring up. This actually might already be an existing issue. For example, we might have a schema that includes a It may be a solution to include authorization checks within the process of constructing the mongoose/mongo query. Essentially for each embedded field we would have to check the authorization rules applied to that field and cross-reference the scopes applied to the user. Of course authorization doesn't apply when using the mongoose wrappers directly, unless 'restCall=true' is applied. As a final thought, from my experience so far the most time consuming aspect of adding/updating features such as this is making sure each new case is tested thoroughly. I have mainly been approaching this through e2e tests but such a big change would likely need more unit tests as well. As a final final thought :) it might be helpful to look at the prisma framework for inspiration. I believe they have already solved many of these issues, however they are purely an ORM, whereas rest-hapi is intended to provide a full REST API solution. Thanks again for your interest. Looking forward to hearing your thoughts if you would like to share :) |
As for your question about the output variations for 'getAll', this is a good question. The second approach actually looks closer to the results you receive when you They both have pros and cons. To begin with, here are the current issues I see:
Some pros and cons between the formats:
|
Thanks for the answers @JKHeadley :) Here is the code we can use to get all the linking documents Lets jump in the let mongooseQuery = ownerModel.aggregate([
{
$match: { // match the owner model, just like a regular model.findById()
_id: new mongoose.Types.ObjectId(ownerId),
},
},
{ // This guy will do the equivalent of a "populate", It do a Left Outer Join to get the children ranking_entity of the owner
$lookup: {
from: "ranking_entity",
localField: "_id",
foreignField: "ranking",
as: "entitys",
},
},
{ // This one is like a "Array.map", it will enable us to iterate throught the ranking_entitys
$unwind: {
path: "$entitys",
},
},
{ // Another "populate" to retrieve the child model "entity" of each "ranking_entity". Worth noting this block can be replicated to embed anything we want in our linking or child model, for any field in them.
$lookup: {
from: "entities",
localField: "entitys.entity",
foreignField: "_id",
as: "entitys.entity",
},
},
{ // $lookup do output arrays of elements, not a single element, so we basically replace the array by its first child, as the linking model always as one and only one corresponding child model. As the last one, it is Worth noting this block can be EXTENDED (not replicated this time, maybe trickier) to embed anything we want in our linking or child model, for any field in them.
$set: {
entitys: {
entity: {
$arrayElemAt: ["$entitys.entity", 0],
},
},
},
},
{ // here we do sorting, on "entitys.rating" which is field on the linking model, but it do actually work on ANY deeply nested field, including the ones we might $embed. See the example commented below
$sort: {
"entitys.rating": -1, // <-- sort entitys by there linking model "rating" field
// "entitys.entity.name" <-- sort entitys by their names, which is not in the linking model but on the child model
},
},
{ // Pagination at the end of pipeline, cannot be done before, otherwise it would sort the documents on the paginated set, not the whole set
$skip: 2,
},
{
$limit: 2,
},
{ // We regroup all the documents in an array
$group: {
_id: "$_id",
entitys: {
$push: "$entitys",
},
},
},
])
// Here I skipped lot of checks just to make the code clear, but its basically just that:
let result;
result = await mongooseQuery.exec()
result = result[0];
result = result['entitys']
return result; // here the response returned by my _getAllHandler function Other query parameters , like field match or field $exclude can be easily implemented with a $match and $unset aggregation AFTER or BEFORE the sorting stage (need to do some benchmarks here maybe) The output of this query on my database is the following, just for you to look at the format: (serialized) [
{
"_id": "63c5988eb293294147ef190d",
"entity": {
"_id": "63c597cfaaf417e3dc7050de",
"name": "Cell",
"description": "occaecat id commodo enim minim",
"createdAt": "2023-01-16T18:30:39.558Z",
"__v": 0
},
"ranking": "63c349c8c4d1bdb75e8cee75",
"__v": 0,
"rating": 1500,
"rd": 400,
"vol": 0.06
},
{
"_id": "63c34b2fb293294147ef0253",
"entity": {
"_id": "63c34918df112e6d9c20c316",
"name": "Vegeta",
"description": "non",
"createdAt": "2023-01-15T00:30:16.851Z",
"__v": 0
},
"ranking": "63c349c8c4d1bdb75e8cee75",
"__v": 0,
"rating": 1350.1746357829322,
"rd": 228.93669034890752,
"vol": 0.05999915643996186
}
] As you can notice it follow the second option of output that I mentionned. Lets talk complexity for a second, if my intuition is good we should drop from
<-- That's why I believe we should make the pagination and sort stage stuck together, just to be sure :p And of course, because there is this problem that I mentionned on my very first message, the simplest query complexity should drop from I think the only challenge here is to generalize the embed mechanism for any deeply nested field using the $lookup and $set stages. Otherwise everything seems good to me, is there something that I didn't thought of ? I didn't thought about any authorization mechanism on the document fields, but maybe you'll have ideas on that I hope :) |
@lsarrazi This is great! At a high level here we are basically discussing the mongoose Regardless, I think it would be very beneficial to implement solutions using both, and allow the user to configure which implementation they prefer. Again, my instinct is that using aggregations in general will be more performant since I imagine it takes advantage of internal MongoDB optimizations (such as the ones you point out). As far as the As you mentioned, the trick here will be to generalize the approach you have demonstrated, and to make sure it supports all current features/functionality. I think this certainly can be done. My first thoughts are approaching it using a type of 'query builder' utility. There might be some existing libraries we can take advantage of, or we could just develop our own. In essence, this would take in all the request parameters and build a single aggregation query. As far as authorization goes, I think we could find a way to integrate authorization checks within the query builder utility. One final note about performance. I do consider performance to be very important, however my priority for this project has always been functionality/robustness as much as possible. I think this approach is fitting simply because rest-hapi is designed to always allow the user to take advantage of the flexibility MongoDB provides, and one of the main advantages of that flexibility is that it allows you to optimize your schema to a specific application, which will almost always (in my opinion) be the biggest factor in performance. As a quick example, using denormalization (via the I greatly appreciate your contributions here. Please let me know if there's anything I've missed (or I'm just wrong, haha). If you would like to contribute to the code itself I'd be happy to assist you however I can. I'm very excited about these ideas and I hope to make progress on them as soon as I'm done with my current updates. |
Thanks, it's a pleasure 😊 To be honest I was not aware of the duplicate fields feature at all, would the sorting benefit from indexes on the duplicated fields ? I do not believe that One last thought on the validation part: There might be an infinite number of possibility for the $embed query with nested fields, for example: |
Sounds great! I'll try to put aside some time to specify the query builder requirements for you. The sorting benefit from denormalizing the data through duplicate fields is that you no longer have to perform a $lookup or populate to be able to sort by a field in an association. So as an example user document:
If you want to sort a user query by 'role.name', you would first have to perform a $lookup or populate to get the role data. If you use duplicate fields to denormalize the 'role.name' field into the user document, then you would have: user document with duplicate field:
Now you can simply sort the user query by the 'roleName' field, which avoids the extra processing involved with $lookup or populate. The As far as JOI validation for nested Anyway, thanks again for the ideas. Feel free to continue the conversation or ask about anything else. |
Hi there, I just wanted to notice you I think I successfully generalized the embedding mechanism for any deeply nested fields. The sorting part is also working without any extra effort. I essentially re-created a populate function to generate $lookup and $set aggregation stages and I'm able to validate if a deeply nested field do exist or not, before executing the mongo query. So it might be also a way to validate the authorization for a given model or field on a model, to not embed things that are not allowed for example. |
@lsarrazi That's great news! Thanks for keeping me updated. Here are some initial requirements for the query builder. We can discuss and modify as needed: Below is a list of requirements for a rest-hapi query builder. There is no particular order to the list
Notes:Since we will be supporting pagination for embedded associations, this means the response format of embedded results should support pagination. For example, instead of :
We would have:
This should be the default format, but we should allow the user to configure a I haven't investigated much how the $term and $text parameters might work when applied to embedded associations. |
Hello @JKHeadley, I hope you're doing good. I have 2 questions:
Have a good day, I stay tuned :) |
Hey @JKHeadley , what's the news about the features your developping ? Are you done with it ? |
Hi @lsarrazi, thanks for checking in. The current features are coming along slowly but surely. Life has picked up recently so I don't have as much time lately, but that just seems to be how these projects go. Apologies, I thought I responded to the previous message. No worries if you feel it's too much to work in association embedding at this time. It's usually best to build a solid foundation first anyway. The benefits of association embedding can be twofold: 1) fewer http requests and 2) it opens the possibility of taking advantage of a more optimized db query. As for the questions:
|
Hi there, I made some work here: https://github.com/lsarrazi/rest-hapi/tree/aggregation-query-builder Its far from done but I think I got a good part of the features we need. I dont have much time those days sorry for the long time not replying, tell me your thoughts on it. Have a good day |
@lsarrazi thanks! I will try to check it out soon. No worries for any delays, that's how this work goes sometimes :) |
Hi everyone here,
I got super excited on your project recently and started to write an API with it.
Good job for all of what have already been done.
I wanted to know if there is tracks to implement sorting through populated fields ? (or just linking model fields at least)
The text was updated successfully, but these errors were encountered: