Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for XML and XPath and numpy/pandas libraries #1265

Open
wants to merge 62 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
8f15cdf
Added random number generator
DavidBuzatu-Marian Jun 10, 2024
a53baad
Refactored random function naming
DavidBuzatu-Marian Jun 11, 2024
038d259
Merge branch 'master-davidbuzatu-scripting' into numpy-pandas-lib
DavidBuzatu-Marian Jun 11, 2024
7b6fc14
Introduced error function to rumbledb
DavidBuzatu-Marian Jun 12, 2024
bcc79e4
Refactored codebase with type schema
DavidBuzatu-Marian Jun 12, 2024
532d461
Fixed digitize and floating point issues
DavidBuzatu-Marian Jun 13, 2024
3bec6c4
Finished non-axis methods
DavidBuzatu-Marian Jun 13, 2024
c87e100
Added minimum with axis
DavidBuzatu-Marian Jun 14, 2024
d353959
Merge branch 'master-davidbuzatu-scripting' into numpy-pandas-lib
DavidBuzatu-Marian Jun 14, 2024
a353d48
Fixed error throw issue
DavidBuzatu-Marian Jun 14, 2024
eff118b
Added max function
DavidBuzatu-Marian Jun 14, 2024
0e6b8c8
Added mean
DavidBuzatu-Marian Jun 14, 2024
2b27e7a
Added absolute
DavidBuzatu-Marian Jun 14, 2024
f3a8fc4
Added sort and count nonzero
DavidBuzatu-Marian Jun 17, 2024
dfd45e0
Added unique
DavidBuzatu-Marian Jun 18, 2024
6525a65
Made count-zero work with non-integer
DavidBuzatu-Marian Jun 18, 2024
7bd15d6
Made reshape be n-dimensional
DavidBuzatu-Marian Jun 19, 2024
909c72c
Added median
DavidBuzatu-Marian Jun 20, 2024
f8db52d
Refactored tests structure
DavidBuzatu-Marian Jun 20, 2024
d62d1e4
Refactored functions
DavidBuzatu-Marian Jun 20, 2024
68dfbd8
Rounded logspace with float
DavidBuzatu-Marian Jun 23, 2024
66bd9e5
Added initial pandas implementation for fillna and dropna, together w…
DavidBuzatu-Marian Jul 6, 2024
662d83c
Implemented initial version of describe
DavidBuzatu-Marian Jul 8, 2024
a033ca0
Added isnull
DavidBuzatu-Marian Jul 8, 2024
df1d985
Refactored random generation to be sequence based
DavidBuzatu-Marian Jul 9, 2024
7f7a0c4
Adding DataFrames to pandas methods
DavidBuzatu-Marian Jul 11, 2024
a6baf1c
Updated random and pandas tests
DavidBuzatu-Marian Jul 16, 2024
fb59e02
Updated random and pandas tests
DavidBuzatu-Marian Jul 16, 2024
6956f3d
Fixed dropna
DavidBuzatu-Marian Jul 18, 2024
4eb87e3
Fixed pull conflicts
DavidBuzatu-Marian Jul 18, 2024
10510f3
Fixed fillna
DavidBuzatu-Marian Jul 19, 2024
39c70ad
Fixed describe and added more fillna tests
DavidBuzatu-Marian Jul 21, 2024
24d9345
Added Items classes and creation methods for XML nodes: document, ele…
DavidBuzatu-Marian Jul 17, 2024
2adf707
Removed whitespace-only nodes from tree
DavidBuzatu-Marian Jul 18, 2024
b53e077
Updated grammar to include XPath
DavidBuzatu-Marian Jul 18, 2024
19da620
Refactored items to include node instance
DavidBuzatu-Marian Jul 21, 2024
8205b1d
Added test for xml
DavidBuzatu-Marian Jul 21, 2024
b0065e7
Generated new parser from grammar
DavidBuzatu-Marian Jul 21, 2024
29293dc
Added skeleton for XML nodes
DavidBuzatu-Marian Jul 21, 2024
c91b76d
Added translation visitor
DavidBuzatu-Marian Jul 22, 2024
3267417
Minor adjustments to class types and visitor
DavidBuzatu-Marian Jul 22, 2024
351ebbd
Refactored step expr compulation logic
DavidBuzatu-Marian Jul 23, 2024
b778ab4
Added more tests and fixed bugs
DavidBuzatu-Marian Jul 25, 2024
1434a66
Refactored classes
DavidBuzatu-Marian Jul 25, 2024
d1bb791
Fixed grammar
DavidBuzatu-Marian Jul 25, 2024
13b3647
Fixed wrong grammar for replace and transform expressions
DavidBuzatu-Marian Jul 25, 2024
a0e77c3
Fixed numpy tests
DavidBuzatu-Marian Jul 25, 2024
b04f463
Added initial runtime behavior
DavidBuzatu-Marian Jul 29, 2024
81f20c0
Fixed XML reader
DavidBuzatu-Marian Jul 29, 2024
38dadf0
Fixing bugs in walkers
DavidBuzatu-Marian Aug 6, 2024
8736922
Fixing describe
DavidBuzatu-Marian Aug 7, 2024
0124f19
Added more tests and fixed bugs
DavidBuzatu-Marian Aug 7, 2024
479dfe2
Fixed bugs relating to selection of nodes
DavidBuzatu-Marian Aug 8, 2024
2f1b0a4
Added more tests
DavidBuzatu-Marian Aug 9, 2024
41231ff
Fixed AnnotatedItem comparison error.
DavidBuzatu-Marian Aug 9, 2024
363c897
Merged fixed pandas
DavidBuzatu-Marian Aug 9, 2024
cc6fd8a
Added sorting to nodes
DavidBuzatu-Marian Aug 9, 2024
0921cee
Refactored step expression
DavidBuzatu-Marian Aug 11, 2024
b4a4f52
Finalized predicates
DavidBuzatu-Marian Aug 11, 2024
b82b191
Fixed execution mode bugs
DavidBuzatu-Marian Aug 13, 2024
d7dac1b
Merge branch 'numpy-pandas-lib' into xml-support
DavidBuzatu-Marian Aug 13, 2024
a1bc0c7
Bug fixing
DavidBuzatu-Marian Aug 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,15 @@ FrontendTests:

RuntimeTests:
stage: tests3
artifacts:
name: "Runtime Tests log"
paths:
- target/runtime_test.log
when:
always
expire_in: 2 days
script:
- mvn -Dtest=RuntimeTests test
- mvn -Dtest=RuntimeTests test --log-file target/runtime_test.log

RuntimeTestsNoParallelism:
stage: tests3
Expand Down
2 changes: 1 addition & 1 deletion gen/XQueryParserScripting.interp
Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@ whileStatement
pathExpr
relativePathExpr
stepExpr
axisStep
stepExpr
forwardStep
forwardAxis
abbrevForwardStep
Expand Down
147 changes: 140 additions & 7 deletions src/main/java/org/rumbledb/api/Item.java
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
package org.rumbledb.api;

import java.io.Serializable;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.List;
import java.util.Map;

import com.esotericsoftware.kryo.KryoSerializable;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.ml.Estimator;
import org.apache.spark.ml.Transformer;
Expand All @@ -20,8 +15,14 @@
import org.rumbledb.serialization.Serializer;
import org.rumbledb.types.FunctionSignature;
import org.rumbledb.types.ItemType;
import org.w3c.dom.Node;

import com.esotericsoftware.kryo.KryoSerializable;
import java.io.Serializable;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;


/**
Expand Down Expand Up @@ -297,6 +298,46 @@ default boolean isBase64Binary() {
return false;
}

/**
* Tests whether the item is an XML Element node.
*
* @return true if it is an XML Element node, false otherwise.
*/
default boolean isElementNode() {
return false;
}

/**
* Tests whether the item is an XML Attribute node.
*
* @return true if it is an XML Attribute node, false otherwise.
*/
default boolean isAttributeNode() {
return false;
}

/**
* Tests whether the item is an XML Text node.
*
* @return true if it is an XML Text node, false otherwise.
*/
default boolean getContent() {
return false;
}

/**
* Tests whether the item is an XML Document node.
*
* @return true if it is an XML Document node, false otherwise.
*/
default boolean isDocumentNode() {
return false;
}

default boolean isTextNode() {
return false;
}

/**
* Returns the members of the item if it is an array.
*
Expand Down Expand Up @@ -747,4 +788,96 @@ default boolean isTransformer() {
default Transformer getTransformer() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

/**
* Returns the string value of the text item, if it is a text item.
*
* @return the string value.
*/
default String getTextValue() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

/**
* Method sets the parent item for all descendents of the current item.
*/
default void addParentToDescendants() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default List<Item> attributes() {
return new ArrayList<>();
}

default String baseUri() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default List<Item> children() {
return new ArrayList<>();
}

default String documentUri() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default boolean isId() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default boolean isIdRefs() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default List<Item> namespaceNodes() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default boolean nilled() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default String nodeKind() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default String nodeName() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default Item parent() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default String stringValue() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default String typeName() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default Item typedValue() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default String unparsedEntityPublicId() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default String unparsedEntityServerId() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default void setParent(Item parent) {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default int compareXmlNode(Item otherNode) {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

default Node getXmlNode() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}
}
39 changes: 39 additions & 0 deletions src/main/java/org/rumbledb/compiler/CloneVisitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -1045,4 +1045,43 @@ public Node visitVariableDeclStatement(VariableDeclStatement statement, Node arg
return result;
}
// end region scripting

// begin region xml

// @Override
// public Node visitPathExpr(PathExpr expression, Node argument) {
// List<Expression> intermediaryPaths = new ArrayList<>();
// for (IntermediaryPath path : expression.getRelativePathExpressions()) {
// Dash dash = null;
// if (path.getPreStepExprDash() != null) {
// dash = new Dash(
// path.getPreStepExprDash().requiresRoot(),
// (StepExpr) this.visitAxisStep(path.getPreStepExprDash().getAxisStep(), argument)
// );
// }
// StepExpr stepExpr = (StepExpr) this.visitStepExpr(path.getStepExpr(), argument);
// intermediaryPaths.add(new IntermediaryPath(dash, stepExpr));
// }
// PathExpr result = new PathExpr(intermediaryPaths, expression.getMetadata());
// result.setStaticSequenceType(expression.getStaticSequenceType());
// result.setStaticContext(expression.getStaticContext());
// return result;
// }

// @Override
// public Node visitStepExpr(StepExpr expression, Node argument) {
// StepExpr result;
// if (expression.getPostFixExpr() != null) {
// Expression postFixExpr = (Expression) this.visit(expression.getPostFixExpr(), argument);
// result = new StepExpr(postFixExpr, expression.getMetadata());
// } else {
// StepExpr stepExpr = (StepExpr) this.visitAxisStep(expression.getAxisStep(), argument);
// result = new StepExpr(stepExpr, expression.getMetadata());
// }
// result.setStaticSequenceType(expression.getStaticSequenceType());
// result.setStaticContext(expression.getStaticContext());
// return result;
// }

// end region xml
}
20 changes: 20 additions & 0 deletions src/main/java/org/rumbledb/compiler/InferTypeVisitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@
import org.rumbledb.expressions.update.RenameExpression;
import org.rumbledb.expressions.update.ReplaceExpression;
import org.rumbledb.expressions.update.TransformExpression;
import org.rumbledb.expressions.xml.PathExpr;
import org.rumbledb.expressions.xml.StepExpr;
import org.rumbledb.runtime.functions.input.FileSystemUtil;
import org.rumbledb.types.BuiltinTypesCatalogue;
import org.rumbledb.types.FieldDescriptor;
Expand Down Expand Up @@ -2598,4 +2600,22 @@ public StaticContext visitBlockExpr(BlockExpression expression, StaticContext ar
}

// endregion

// region xml

@Override
public StaticContext visitPathExpr(PathExpr pathExpr, StaticContext argument) {
visitDescendants(pathExpr, argument);
pathExpr.setStaticSequenceType(SequenceType.ITEM_STAR);
return argument;
}

// TODO: Currently, step expressions are marked as string, but this type may differ. Update to relevant type.
@Override
public StaticContext visitStepExpr(StepExpr stepExpr, StaticContext argument) {
stepExpr.setStaticSequenceType(SequenceType.ITEM_STAR);
return argument;
}

// end xml
}
68 changes: 68 additions & 0 deletions src/main/java/org/rumbledb/compiler/RuntimeIteratorVisitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import org.rumbledb.exceptions.UnsupportedFeatureException;
import org.rumbledb.expressions.AbstractNodeVisitor;
import org.rumbledb.expressions.CommaExpression;
import org.rumbledb.expressions.ExecutionMode;
import org.rumbledb.expressions.Expression;
import org.rumbledb.expressions.Node;
import org.rumbledb.expressions.arithmetic.AdditiveExpression;
Expand Down Expand Up @@ -114,6 +115,9 @@
import org.rumbledb.expressions.update.RenameExpression;
import org.rumbledb.expressions.update.ReplaceExpression;
import org.rumbledb.expressions.update.TransformExpression;
import org.rumbledb.expressions.xml.PathExpr;
import org.rumbledb.expressions.xml.StepExpr;
import org.rumbledb.expressions.xml.node_test.NodeTest;
import org.rumbledb.items.ItemFactory;
import org.rumbledb.runtime.AtMostOneItemLocalRuntimeIterator;
import org.rumbledb.runtime.CommaExpressionIterator;
Expand Down Expand Up @@ -190,6 +194,11 @@
import org.rumbledb.runtime.update.expression.RenameExpressionIterator;
import org.rumbledb.runtime.update.expression.ReplaceExpressionIterator;
import org.rumbledb.runtime.update.expression.TransformExpressionIterator;
import org.rumbledb.runtime.xml.AtomizationIterator;
import org.rumbledb.runtime.xml.PathExprIterator;
import org.rumbledb.runtime.xml.StepExprIterator;
import org.rumbledb.runtime.xml.axis.AxisIterator;
import org.rumbledb.runtime.xml.axis.AxisIteratorVisitor;
import org.rumbledb.types.BuiltinTypesCatalogue;
import org.rumbledb.types.SequenceType;

Expand Down Expand Up @@ -944,6 +953,13 @@ public RuntimeIterator visitRangeExpr(RangeExpression expression, RuntimeIterato
public RuntimeIterator visitComparisonExpr(ComparisonExpression expression, RuntimeIterator argument) {
RuntimeIterator left = this.visit(expression.getChildren().get(0), argument);
RuntimeIterator right = this.visit(expression.getChildren().get(1), argument);
if (left instanceof PathExprIterator) {
// We potentially need to atomize
left = new AtomizationIterator(
left,
expression.getStaticContextForRuntime(this.config, this.visitorConfig)
);
}
RuntimeIterator runtimeIterator = new ComparisonIterator(
left,
right,
Expand Down Expand Up @@ -1414,4 +1430,56 @@ public RuntimeIterator visitFlowrStatement(FlowrStatement statement, RuntimeIter
runtimeIterator.setStaticContext(statement.getStaticContext());
return runtimeIterator;
}

@Override
public RuntimeIterator visitPathExpr(PathExpr pathExpr, RuntimeIterator argument) {
List<RuntimeIterator> stepExprIterators = new ArrayList<>();
pathExpr.getRelativePathExpressions()
.forEach(relativePathExpr -> stepExprIterators.add(this.visit(relativePathExpr, argument)));
RuntimeIterator getRootIterator = null;
if (pathExpr.needsRoot()) {
getRootIterator = this.visitFunctionCall(pathExpr.getFetchRootFunction(), argument);
}
RuntimeIterator runtimeIterator = new PathExprIterator(
stepExprIterators,
getRootIterator,
new RuntimeStaticContext(
this.config,
pathExpr.getStaticSequenceType(),
pathExpr.getHighestExecutionMode(this.visitorConfig),
pathExpr.getMetadata()
)
);
runtimeIterator.setStaticContext(pathExpr.getStaticContext());
return runtimeIterator;
}

@Override
public RuntimeIterator visitStepExpr(StepExpr stepExpr, RuntimeIterator argument) {
AxisIterator axisIterator = this.visitAxisStep(stepExpr, stepExpr.getMetadata());
NodeTest nodeTest = stepExpr.getNodeTest();
return new StepExprIterator(
axisIterator,
nodeTest,
new RuntimeStaticContext(
this.config,
SequenceType.ITEM,
stepExpr.getHighestExecutionMode(this.visitorConfig),
stepExpr.getMetadata()
)
);
}

private AxisIterator visitAxisStep(StepExpr stepExpr, ExceptionMetadata metadata) {
return stepExpr.accept(
new AxisIteratorVisitor(),
new RuntimeStaticContext(
this.config,
SequenceType.STRING,
ExecutionMode.LOCAL,
metadata
)
);
}

}
Loading
Loading