Encoding

How to convert the body of a decoded program into a bytes.

Parameters

decoded_program: &String: A reference to the decoded program as a String.
tokens: &Map: A reference to a map containing the token values. This map is used to find the byte sequences for each token.
perform_normalize: bool: A boolean flag indicating whether to normalize the input text before encoding.
display_mode: DisplayMode: An enum specifying the display mode for the text. The available modes are Pretty, Accessible, and TiAscii.
encode_mode: EncodeMode: An enum specifying the encoding mode. The available modes are Min, Max, and Smart.

Return Value

The function returns a Vec<u8> containing the encoded bytestream.

Encoding Process

Initialization:
- An empty Vec<u8> (encoded_program) is created to hold the final encoded byte sequence.
Normalization:
- If perform_normalize is true, the function normalizes the input decoded_program using the normalize function.
- If perform_normalize is false, the input decoded_program is used as-is.
Line Processing:
- The function iterates through each line of the normalized (or original) decoded_program.
- A EncodeState struct is used to track the current state during the encoding process.
Token Matching:
- Tokens are found based on encode_mode:
  - EncodeMode::Max: Finds the longest matching token.
  - EncodeMode::Min: Finds the shortest matching token.
  - EncodeMode::Smart:
    - Finds the longest matching token.
    - If Prgm or ʟ is found, it switches to EncodeMode::Min until the line ends, a string is found, or a non alphabetic character is found.
    - If '"' is found, it switches to EncodeMode::Min until the next '"' is found.
      - If the token before it is Send(, it switches to EncodeMode::Min
    - If -> / → if found, resets the mode to EncodeMode::Smart after the token is found.
  - The token's byte sequence is added to encoded_program.
  - The token's value is removed from the line being processed.
  - If the token is a double-quote ("), the in_string flag is toggled.
- If no token is found, an error message is printed, and the program exits.
New Line Handling:
- After processing each line, a new line byte (0x3F) is added to encoded_program.
- The last new line byte is removed before returning the encoded program.

Normalization

The normalize function replaces certain Unicode characters with their equivalent representations:

fn normalize(string: &str) -> String {
    let string = string
        .replace('\u{0398}', "θ")
        .replace('\u{03F4}', "θ")
        .replace('\u{1DBF}', "θ");
    string
}

Key Conversion

The convert_key_to_bytes function converts a token key (in string format: $XX or $XX$XX) to a vector of bytes:

fn convert_key_to_bytes(key: &str) -> Vec<u8> {
    let key = key.replace(" en", "");
    let keys = key.split("$").collect::<Vec<&str>>();
    let keys = keys[1..].to_vec();
    let mut bytes = Vec::new();

    for key in keys {
        let byte = u8::from_str_radix(key, 16).unwrap();
        bytes.push(byte);
    }

    bytes
}

Error Handling

The function handles errors in the following scenarios:

If no matching token is found for a part of the line, an error message is printed, and the program exits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly