Skip to content

Latest commit

 

History

History
157 lines (145 loc) · 2.95 KB

README.md

File metadata and controls

157 lines (145 loc) · 2.95 KB

dfsql

  • Revision: the standalone count command is replaced with len, so make sure to replace (count) and col "count" with len and col "len" respectively.
    • the unary count <col> command is unaffected.

Install

cargo install dfsql

How to run

dfsql --input your.csv --output a-new.csv
# ...or
dfsql -i your.csv -o a-new.csv

REPL

  • exit/quit: exit the REPL loop.
    exit
  • undo: undo the previous successful operation.
    undo
  • reset: reset all the changes and go back to the original data frame.
    reset
  • schema: show column names and types of the data frame.
    schema
  • save: save the current data frame to a file.
    save a-new.csv

Statements

  • select
    select <expr>*
    select last_name first_name
    • Select columns "last_name" and "first_name" and collect them into a data frame.
  • Group by
    group (<col> | <var>)* agg <expr>*
    group first_name agg (count)
    • Group the data frame by column "first_name" and then aggregate each group with the count of the members.
  • filter
    filter <expr>
    filter first_name = "John"
  • limit
    limit <int>
    limit 5
  • reverse
    reverse
  • sort
    sort ((asc | desc | ()) <col>)*
    sort icpsr_id
  • use
    use <var>
    use other
    • Switch to the data frame called other.
  • join
    (left | right | inner | full) join <var> on <col> <col>?
    left join other on id ID
    • left join the data frame called other on my column id and its column ID

Expressions

  • col: reference to a column.
    col : (<str> | <var>) -> <expr>
    select col first_name
  • exclude: remove columns from the data frame.
    exclude : <expr>* -> <expr>
    select exclude last_name first_name
  • literal: literal values like 42, "John", 1.0, and null.
  • binary operations
    select a * b
    • Calculate the product of columns "a" and "b" and collect the result.
  • unary operations
    select -a
    select sum a
    • Sum all values in column "a" and collect the scalar result.
  • alias: assign a name to a column.
    alias : (<col> | <var>) <expr> -> <expr>
    select alias product a * b
    • Assign the name "product" to the product and collect the new column.
  • conditional
    <conditional> : if <expr> then <expr> (if <expr> then <expr>)* otherwise <expr> -> <expr>
    select if class = 0 then "A" if class = 1 then "B" else null
  • cast: cast a column to either type str, int, or float.
    cast : <type> <expr> -> <expr>
    select cast str id
    • Cast the column "id" to type str and collect the result.