-
Notifications
You must be signed in to change notification settings - Fork 0
Encoding
cqb13 edited this page Jul 10, 2024
·
2 revisions
How to convert the body of a decoded program into a bytes.
-
decoded_program: &String
: A reference to the decoded program as aString
. -
tokens: &Map
: A reference to a map containing the token values. This map is used to find the byte sequences for each token. -
perform_normalize: bool
: A boolean flag indicating whether to normalize the input text before encoding. -
display_mode: DisplayMode
: An enum specifying the display mode for the text. The available modes arePretty
,Accessible
, andTiAscii
. -
encode_mode: EncodeMode
: An enum specifying the encoding mode. The available modes areMin
,Max
, andSmart
.
The function returns a Vec<u8>
containing the encoded bytestream.
-
Initialization:
- An empty
Vec<u8>
(encoded_program
) is created to hold the final encoded byte sequence.
- An empty
-
Normalization:
- If
perform_normalize
istrue
, the function normalizes the inputdecoded_program
using thenormalize
function. - If
perform_normalize
isfalse
, the inputdecoded_program
is used as-is.
- If
-
Line Processing:
- The function iterates through each line of the normalized (or original)
decoded_program
. -
A EncodeState
struct is used to track the current state during the encoding process.
- The function iterates through each line of the normalized (or original)
-
Token Matching:
- Tokens are found based on
encode_mode
:-
EncodeMode::Max
: Finds the longest matching token. -
EncodeMode::Min
: Finds the shortest matching token. -
EncodeMode::Smart
:- Finds the longest matching token.
- If Prgm or ʟ is found, it switches to
EncodeMode::Min
until the line ends, a string is found, or a non alphabetic character is found. - If '"' is found, it switches to
EncodeMode::Min
until the next '"' is found.- If the token before it is Send(, it switches to
EncodeMode::Min
- If the token before it is Send(, it switches to
- If -> / → if found, resets the mode to
EncodeMode::Smart
after the token is found.
- The token's byte sequence is added to
encoded_program
. - The token's value is removed from the line being processed.
- If the token is a double-quote (
"
), thein_string
flag is toggled.
-
- If no token is found, an error message is printed, and the program exits.
- Tokens are found based on
-
New Line Handling:
- After processing each line, a new line byte (
0x3F
) is added toencoded_program
. - The last new line byte is removed before returning the encoded program.
- After processing each line, a new line byte (
The normalize
function replaces certain Unicode characters with their equivalent representations:
fn normalize(string: &str) -> String {
let string = string
.replace('\u{0398}', "θ")
.replace('\u{03F4}', "θ")
.replace('\u{1DBF}', "θ");
string
}
The convert_key_to_bytes
function converts a token key (in string format: $XX or $XX$XX) to a vector of bytes:
fn convert_key_to_bytes(key: &str) -> Vec<u8> {
let key = key.replace(" en", "");
let keys = key.split("$").collect::<Vec<&str>>();
let keys = keys[1..].to_vec();
let mut bytes = Vec::new();
for key in keys {
let byte = u8::from_str_radix(key, 16).unwrap();
bytes.push(byte);
}
bytes
}
The function handles errors in the following scenarios:
- If no matching token is found for a part of the line, an error message is printed, and the program exits.