Hello,

I have been learning rust and want to make something exciting so I though why not build a Lisp compiler in Rust B)

so here what I did today. https://gist.github.com/anon2834678263/bcaa06e934f7b478be79203553f170ee

the tokenizer isn’t ready and might have horrible bugs but at least I got comfortable declaring immutable variable by default, not surrounding stuff with parentheses unnecessarily. oh and also Enums which is most powerful thing in rust as people say.

I am still not satisfied though since the code looks more like C than Rust xD

maybe some experienced people can correct me :)

  • TehPers@beehaw.org
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    7 days ago

    maybe some experienced people can correct me

    Well first thing, I’d recommend saving the gist with a .rs extension so we get syntax highlighting :)

    You should convert your loop to iterate over input.chars() instead. Your current loop will have issues if someone writes, for example, “naïveté” due to some of those letters being multiple bytes long in UTF-8 (which is the encoding str and String use). What you can do is:

    let mut input = input.chars().peekable();
    while let Some(ch) = input.next() {
        // ...
    }
    

    This also lets you input.peek() in the body to look at the next character without taking it from the iterator. input.peek().is_some() tells you if there’s more data (and is_none tells if you’re at the end of the input).

    Even in C, I’d have made the big if-chain use if-else to avoid evaluating conditions that are known to be false. However, here I’d convert it to a big match statement:

    match ch {
        ' ' => {}
        ';' => {
            // ...
        }
        _ if ch.is_numeric() => {
            // ...
        }
        // ...
    }
    

    As an optional (and more advanced) thing, you can return slices into the input instead of copies of those slices:

    fn tokenize(input: &str) -> Vec<(TokenType, &str)>
    

    If you want to do this, you should do .char_indices() instead of .chars() so you know where to slice the input string at.

    Otherwise, you can use std::mem::take(&mut current_token) to replace current_token with an empty string and take (without cloning) the existing value out of that variable:

    use std::mem::take;
    
    tokens.push((blah, take(&mut current_token)));
    
    • ☭可爱小猫☭@programming.devOP
      link
      fedilink
      arrow-up
      3
      ·
      7 days ago

      Thank you kind stranger!

      I will take notes and will make these changes. after improving this I will move to create AST which would be more fun! again thank :)

      • TehPers@beehaw.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        7 days ago

        FYI once you’re done you should take a look at some of the parsing libraries out there. Some I’d recommend looking at:

        • pest - grammar based
        • lalrpop - more traditional LR(1) parser generator
        • winnow or nom - parsing combinators, probably the easiest of these to use (and most flexible)