Skip to content
forked from igumnoff/shiva

Shiva library: Implementation in Rust of a parser and generator for documents of any type

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

josercarmo/shiva

 
 

Repository files navigation

Shiva

shiva

Shiva library: Implementation in Rust of a parser and generator for documents of any type

Features

  • Common Document Model (CDM) for all document types
  • Parsers produce CDM
  • Generators consume CDM

Common Document Model

Common Document Model

Supported document types

Document type Parse Generate
Plain text + +
Markdown + +
HTML + +
PDF + +
JSON + +
XML + +
CSV + +
RTF + +
DOCX + +
XLS + -
XLSX + +
ODS + +
Typst - +

Parse document features

Document type Header Paragraph List Table Image Hyperlink PageHeader PageFooter
Plain text - + - - - - - -
Markdown + + + + + + - -
HTML + + + + + + - -
PDF - + + - - - - -
DOCX + + + + - + - -
RTF + + + + - + + +
JSON + + + + - + + +
XML + + + + + + + +
CSV - - - + - - - -
XLS - - - + - - - -
XLSX - - - + - - - -
ODS - - - + - - - -

Generate document features

Document type Header Paragraph List Table Image Hyperlink PageHeader PageFooter
Plain text + + + + - + + +
Markdown + + + + + + + +
HTML + + + + + + - -
PDF + + + + + + + +
DOCX + + + + + + - -
RTF + + + + + + - -
JSON + + + + - + + +
XML + + + + + + + +
CSV - - - + - - - -
XLSX - - - + - - - -
ODS - - - + - - - -
Typst + + + + + + + +

Usage Shiva library

Cargo.toml

[dependencies]
shiva = {  version = "1.4.6", features = ["html", "markdown", "text", "pdf", "json", 
    "csv", "rtf", "docx", "xml", "xls", "xlsx", "ods", "typst"] }

main.rs

fn main() {
    let input_vec = std::fs::read("input.html").unwrap();
    let input_bytes = bytes::Bytes::from(input_vec);
    let document = shiva::html::Transformer::parse(&input_bytes).unwrap();
    let output_bytes = shiva::markdown::Transformer::generate(&document).unwrap();
    std::fs::write("out.md", output_bytes).unwrap();
}

Shiva CLI & Server

Build executable Shiva CLI and Shiva Server

git clone https://github.com/igumnoff/shiva.git
cd shiva/cli
cargo build --release

Run executable Shiva CLI

cd ./target/release/
./shiva README.markdown README.html

Run Shiva Server

cd ./target/release/
./shiva-server --port=8080 --host=127.0.0.1

Who uses Shiva

Contributing

I would love to see contributions from the community. If you experience bugs, feel free to open an issue. If you would like to implement a new feature or bug fix, please follow the steps:

  1. Do fork
  2. Add comment to the issue that you are going to work on it
  3. Create pull request

If you would like add new document type, you need to implement the following traits:

Required: shiva::core::TransformerTrait

pub trait TransformerTrait {
    fn parse(document: &Bytes) -> anyhow::Result<Document>;
    fn generate(document: &Document) -> anyhow::Result<Bytes>;
}

Optional: shiva::core::TransformerWithImageLoaderSaverTrait (If images store outside of document for example: HTML, Markdown)

pub trait TransformerWithImageLoaderSaverTrait {
    fn parse_with_loader<F>(document: &Bytes,  image_loader: F) -> anyhow::Result<Document>
        where F: Fn(&str) -> anyhow::Result<Bytes>;
    fn generate_with_saver<F>(document: &Document,  image_saver: F) -> anyhow::Result<Bytes>
        where F: Fn(&Bytes, &str) -> anyhow::Result<()>;
}

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Shiva by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

Shiva library: Implementation in Rust of a parser and generator for documents of any type

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 92.0%
  • Rich Text Format 7.1%
  • Other 0.9%