Skip to content

Commit

Permalink
Spaces after commas (#2546)
Browse files Browse the repository at this point in the history
  • Loading branch information
kescobo authored Nov 18, 2020
1 parent d938f9b commit b7ece34
Show file tree
Hide file tree
Showing 42 changed files with 1,596 additions and 1,596 deletions.
4 changes: 2 additions & 2 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ a `SubDataFrame` or a `DataFrameRow` always returns a `DataFrameRow` (which is a

`getindex` on `DataFrame`:
* `df[row, col]` -> the value contained in row `row` of column `col`, the same as `df[!, col][row]`;
* `df[CartesianIndex(row, col)]` -> the same as `df[row,col]`;
* `df[CartesianIndex(row, col)]` -> the same as `df[row, col]`;
* `df[row, cols]` -> a `DataFrameRow` with parent `df`;
* `df[rows, col]` -> a copy of the vector `df[!, col]` with only the entries corresponding to `rows` selected,
the same as `df[!, col][rows]`;
Expand All @@ -79,7 +79,7 @@ a `SubDataFrame` or a `DataFrameRow` always returns a `DataFrameRow` (which is a

`getindex` on `SubDataFrame`:
* `sdf[row, col]` -> a value contained in row `row` of column `col`;
* `sdf[CartesianIndex(row, col)]` -> the same as `sdf[row,col]`;
* `sdf[CartesianIndex(row, col)]` -> the same as `sdf[row, col]`;
* `sdf[row, cols]` -> a `DataFrameRow` with parent `parent(sdf)`;
* `sdf[rows, col]` -> a copy of `sdf[!, col]` with only rows `rows` selected, the same as `sdf[!, col][rows]`;
* `sdf[rows, cols]` -> a `DataFrame` containing columns `cols` and `sdf[rows, col]` as a vector for each `col` in `cols`;
Expand Down
54 changes: 27 additions & 27 deletions docs/src/man/comparisons.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,16 @@ as row indices rather than a separate `id` column.

### Accessing data

| Operation | pandas | DataFrames.jl |
|:---------------------------|:-----------------------|:-----------------------------------|
| Cell indexing by location | `df.iloc[1, 1]` | `df[2, 2]` |
| Row slicing by location | `df.iloc[1:3]` | `df[2:3, :]` |
| Column slicing by location | `df.iloc[:, 1:]` | `df[:, 2:end]` |
| Row indexing by label | `df.loc['c']` | `df[findfirst(==('c'), df.id), :]` |
| Column indexing by label | `df.loc[:, 'x']` | `df[:, :x]` |
| Column slicing by label | `df.loc[:, ['x','z']]` | `df[:, [:x, :z]]` |
| | `df.loc[:, 'x':'z']` | `df[:, Between(:x, :z)]` |
| Mixed indexing | `df.loc['c'][1]` | `df[findfirst(==('c'), df.id), 2]` |
| Operation | pandas | DataFrames.jl |
|:---------------------------|:------------------------|:-----------------------------------|
| Cell indexing by location | `df.iloc[1, 1]` | `df[2, 2]` |
| Row slicing by location | `df.iloc[1:3]` | `df[2:3, :]` |
| Column slicing by location | `df.iloc[:, 1:]` | `df[:, 2:end]` |
| Row indexing by label | `df.loc['c']` | `df[findfirst(==('c'), df.id), :]` |
| Column indexing by label | `df.loc[:, 'x']` | `df[:, :x]` |
| Column slicing by label | `df.loc[:, ['x', 'z']]` | `df[:, [:x, :z]]` |
| | `df.loc[:, 'x':'z']` | `df[:, Between(:x, :z)]` |
| Mixed indexing | `df.loc['c'][1]` | `df[findfirst(==('c'), df.id), 2]` |

Note that Julia uses 1-based indexing, inclusive on both ends. A special keyword `end` can be used to
indicate the last index. Likewise, the `begin` keyword can be used to indicate the first index.
Expand Down Expand Up @@ -96,7 +96,7 @@ examples above do not synchronize the column names between pandas and DataFrames
### Mutating operations

| Operation | pandas | DataFrames.jl |
| :----------------- | :---------------------------------------------------- | :------------------------------------------- |
|:-------------------|:------------------------------------------------------|:---------------------------------------------|
| Add new columns | `df['z1'] = df['z'] + 1` | `df.z1 = df.z .+ 1` |
| | | `transform!(df, :z => (x -> x .+ 1) => :z1)` |
| | `df.insert(1, 'const', 10)` | `insertcols!(df, 2, :const => 10)` |
Expand All @@ -115,12 +115,12 @@ over each group independently. The result of `groupby` is a `GroupedDataFrame` o
which may be processed using the `combine`, `transform`, or `select` functions.
The following table illustrates some common grouping and aggregation usages.

| Operation | pandas | DataFrames.jl |
|:--------------------------------|:--------------------------------------------------------------------------------------|:-----------------------------------------------------|
| Aggregate by groups | `df.groupby('grp')['x'].mean()` | `combine(groupby(df, :grp), :x => mean)` |
| Rename column after aggregation | `df.groupby('grp')['x'].mean().rename("my_mean")` | `combine(groupby(df, :grp), :x => mean => :my_mean)` |
| Add aggregated data as column | `df.join(df.groupby('grp')['x'].mean(), on='grp', rsuffix='_mean')` | `transform(groupby(df, :grp), :x => mean)` |
| ...and select output columns | `df.join(df.groupby('grp')['x'].mean(), on='grp', rsuffix='_mean')[['grp','x_mean']]` | `select(groupby(df, :grp), :id, :x => mean)` |
| Operation | pandas | DataFrames.jl |
|:--------------------------------|:---------------------------------------------------------------------------------------|:-----------------------------------------------------|
| Aggregate by groups | `df.groupby('grp')['x'].mean()` | `combine(groupby(df, :grp), :x => mean)` |
| Rename column after aggregation | `df.groupby('grp')['x'].mean().rename("my_mean")` | `combine(groupby(df, :grp), :x => mean => :my_mean)` |
| Add aggregated data as column | `df.join(df.groupby('grp')['x'].mean(), on='grp', rsuffix='_mean')` | `transform(groupby(df, :grp), :x => mean)` |
| ...and select output columns | `df.join(df.groupby('grp')['x'].mean(), on='grp', rsuffix='_mean')[['grp', 'x_mean']]` | `select(groupby(df, :grp), :id, :x => mean)` |

Note that pandas returns a `Series` object for 1-dimensional result unless `reset_index` is called afterwards.
The corresponding DataFrames.jl examples return an equivalent `DataFrame` object.
Expand Down Expand Up @@ -157,11 +157,11 @@ This section includes more complex examples.
|:---------------------------------------|:-----------------------------------------------------------------------------|:----------------------------------------------------------|
| Complex Function | `df[['z']].agg(lambda v: np.mean(np.cos(v)))` | `combine(df, :z => v -> mean(cos, skipmissing(v)))` |
| Aggregate multiple columns | `df.agg({'x': max, 'y': min})` | `combine(df, :x => maximum, :y => minimum)` |
| | `df[['x','y']].mean()` | `combine(df, [:x, :y] .=> mean)` |
| | `df[['x', 'y']].mean()` | `combine(df, [:x, :y] .=> mean)` |
| | `df.filter(regex=("^x")).mean()` | `combine(df, names(df, r"^x") .=> mean)` |
| Apply function over multiple variables | `df.assign(x_y_cor = np.corrcoef(df.x, df.y)[0,1])` | `transform(df, [:x, :y] => cor)` |
| Apply function over multiple variables | `df.assign(x_y_cor = np.corrcoef(df.x, df.y)[0, 1])` | `transform(df, [:x, :y] => cor)` |
| Row-wise operation | `df.assign(x_y_min = df.apply(lambda v: min(v.x, v.y), axis=1))` | `transform(df, [:x, :y] => ByRow(min))` |
| | `df.assign(x_y_argmax = df.apply(lambda v: df.columns[v.argmax()], axis=1))` | `transform(df, AsTable([:x,:y]) => ByRow(argmax))` |
| | `df.assign(x_y_argmax = df.apply(lambda v: df.columns[v.argmax()], axis=1))` | `transform(df, AsTable([:x, :y]) => ByRow(argmax))` |
| DataFrame as input | `df.groupby('grp').head(2)` | `combine(d -> first(d, 2), groupby(df, :grp))` |
| DataFrame as output | `df[['x']].agg(lambda x: [min(x), max(x)])` | `combine(:x => x -> (x = [minimum(x), maximum(x)],), df)` |

Expand All @@ -174,7 +174,7 @@ but `select` and `transform` retain an original row ordering.
DataFrames.jl supports join operations similar to a relational database.

| Operation | pandas | DataFrames.jl |
| :-------------------- | :--------------------------------------------- | :------------------------------ |
|:----------------------|:-----------------------------------------------|:--------------------------------|
| Inner join | `pd.merge(df, df2, how = 'inner', on = 'grp')` | `innerjoin(df, df2, on = :grp)` |
| Outer join | `pd.merge(df, df2, how = 'outer', on = 'grp')` | `outerjoin(df, df2, on = :grp)` |
| Left join | `pd.merge(df, df2, how = 'left', on = 'grp')` | `leftjoin(df, df2, on = :grp)` |
Expand All @@ -198,7 +198,7 @@ df <- tibble(grp = rep(1:2, 3), x = 6:1, y = 4:9,
```

| Operation | dplyr | DataFrames.jl |
| :----------------------- | :----------------------------- | :------------------------------------- |
|:-------------------------|:-------------------------------|:---------------------------------------|
| Reduce multiple values | `summarize(df, mean(x))` | `combine(df, :x => mean)` |
| Add new columns | `mutate(df, x_mean = mean(x))` | `transform(df, :x => mean => :x_mean)` |
| Rename columns | `rename(df, x_new = x)` | `rename(df, :x => :x_new)` |
Expand All @@ -210,15 +210,15 @@ df <- tibble(grp = rep(1:2, 3), x = 6:1, y = 4:9,
As in dplyr, some of these functions can be applied to grouped data frames, in which case they operate by group:

| Operation | dplyr | DataFrames.jl |
| :----------------------- | :----------------------------------------- | :------------------------------------------ |
|:-------------------------|:-------------------------------------------|:--------------------------------------------|
| Reduce multiple values | `summarize(group_by(df, grp), mean(x))` | `combine(groupby(df, :grp), :x => mean)` |
| Add new columns | `mutate(group_by(df, grp), mean(x))` | `transform(groupby(df, :grp), :x => mean)` |
| Pick & transform columns | `transmute(group_by(df, grp), mean(x), y)` | `select(groupby(df, :grp), :x => mean, :y)` |

The table below compares more advanced commands:

| Operation | dplyr | DataFrames.jl |
| :------------------------ | :-------------------------------------------------------- | :------------------------------------------------------------ |
|:--------------------------|:----------------------------------------------------------|:--------------------------------------------------------------|
| Complex Function | `summarize(df, mean(x, na.rm = T))` | `combine(df, :x => x -> mean(skipmissing(x)))` |
| Transform several columns | `summarize(df, max(x), min(y))` | `combine(df, :x => maximum, :y => minimum)` |
| | `summarize(df, across(c(x, y), mean))` | `combine(df, [:x, :y] .=> mean)` |
Expand All @@ -235,7 +235,7 @@ The table below compares more advanced commands:
The following table compares the main functions of DataFrames.jl with Stata:

| Operation | Stata | DataFrames.jl |
| :--------------------- | :---------------------- | :-------------------------------------- |
|:-----------------------|:------------------------|:----------------------------------------|
| Reduce multiple values | `collapse (mean) x` | `combine(df, :x => mean)` |
| Add new columns | `egen x_mean = mean(x)` | `transform!(df, :x => mean => :x_mean)` |
| Rename columns | `rename x x_new` | `rename!(df, :x => :x_new)` |
Expand All @@ -248,14 +248,14 @@ Note that the suffix `!` (i.e. `transform!`, `select!`, etc) ensures that the op
Some of these functions can be applied to grouped data frames, in which case they operate by group:

| Operation | Stata | DataFrames.jl |
| :--------------------- | :------------------------------- | :------------------------------------------ |
|:-----------------------|:---------------------------------|:--------------------------------------------|
| Add new columns | `egen x_mean = mean(x), by(grp)` | `transform!(groupby(df, :grp), :x => mean)` |
| Reduce multiple values | `collapse (mean) x, by(grp)` | `combine(groupby(df, :grp), :x => mean)` |

The table below compares more advanced commands:

| Operation | Stata | DataFrames.jl |
| :------------------------ | :----------------------------- | :--------------------------------------------------------- |
|:--------------------------|:-------------------------------|:-----------------------------------------------------------|
| Transform certain rows | `replace x = 0 if x <= 0` | `transform(df, :x => (x -> ifelse.(x .<= 0, 0, x)) => :x)` |
| Transform several columns | `collapse (max) x (min) y` | `combine(df, :x => maximum, :y => minimum)` |
| | `collapse (mean) x y` | `combine(df, [:x, :y] .=> mean)` |
Expand Down
2 changes: 1 addition & 1 deletion docs/src/man/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ A particular common case of a collection that supports the
[Tables.jl](https://github.com/JuliaData/Tables.jl) interface is
a vector of `NamedTuple`s:
```
julia> v = [(a=1,b=2), (a=3,b=4)]
julia> v = [(a=1, b=2), (a=3, b=4)]
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Int64}},1}:
(a = 1, b = 2)
(a = 3, b = 4)
Expand Down
Loading

0 comments on commit b7ece34

Please sign in to comment.