Removing normalization. After feedback and further consideration, I d…

…ecided that normalization reduced security without offering much in the way of benefits. It was intended to help reduce mistypings, but's likely that a user will run this multiple times anyway to confirm it creates a consistent mnemonic.
ursuscamp · Sep 6, 2022 · 6c14918 · 6c14918
1 parent 1f43f4e
commit 6c14918
Show file tree

Hide file tree

Showing 5 changed files with 11 additions and 164 deletions.
diff --git a/README.md b/README.md
@@ -30,46 +30,25 @@ BIP-39 seed phrases have become the lingua franca of Bitcoin key management. Alm
 
 After providing a passphrase to the utility it:
 
-1. Normalizes valid UTF-8 input to prevent some input entry errors:
-   * Converts it to ASCII lowercase.
-   * Removes invalid characters. Valid characters are `[a-z0-9 ]`.
-   * Condenses consecutive spaces to one space.
-   * Removes beginning and trailing spaces.
-   * For example: `"Hello WORLD!!!!"` becomes `"hello world"`.
-2. Hashes it with SHA-256 ten million times.
+1. Take some input, typically a passphrase.
+2. Hashes it with SHA-256 ten million times (by default).
 3. Uses the result as entropy to generate a 12 or 24 word BIP-39 seed phrase.
 
 ### Is this not poor security?
 
-Well, humans are relatively predictable, so it won't stand up to brute force attacks like a random seed phrase will. On the other hand, it might also be better opsec to have a passphrase that is hard to forget and only in your head, instead of a random phrase that you have to keep a physical copy just to remember.
+Well, humans are relatively predictable, so it won't stand up to brute force attacks like a random seed mnemonic will. On the other hand, it might also be better opsec to have a passphrase that is hard to forget and only in your head, instead of a random phrase that you have to keep a physical copy just to remember.
 
 In a pinch, it may be a good way to flee a hostile area with your wealth intact.
 
-To help with brute force resistance, it uses 10,000,000 iterations of SHA-256, which takes several seconds on my modern MacBook. If you wish to opt for additional security, you can increase the default number of iterations and also turn off normalization.
-
-### What if I wish to use non-Latin alphabet?
-
-I would suggest disabling normalization.
-
-Also, if you want, you don't need to use UTF-8 text. You can can pass in any data, and if it doesn't recognize it as a UTF-8 compatible string, it will skip normalization completely. For example, random data:
-
-```shell
-$ cat /dev/urandom | head -c 1024 > junk.dat
-$ brainseed -f junk.dat
-arch few liar output sadness page lunch much swap much funny pupil
-```
-
-You may also want to pass in `-u` to force stop normalization, just in case, by some random chance, the bytes in your binary file happen to form a valid UTF-8 string.
-
-However, this may be less secure than a passphrase since you must store that file somewhere. YMMV
+To help with brute force resistance, it uses 10,000,000 iterations of SHA-256, which takes several seconds on my modern MacBook. If you wish to opt for additional security, you can increase the default number of iterations.
 
 ### How can I generate a 24 word phrase?
 
 Use the `-l` or `--long` flag to get a 24 word seed phrase.
 
 ### What about rainbow tables?
 
-Yup, that's a danger. Use a phrase meaningful to you, not a famous movie line or something. Also consider using a custom number of SHA-256 iterations as this will help foil rainbow attacks.
+Yup, that's a danger. Use a phrase meaningful to you, not a famous movie line or something like that. Also consider using a custom number of SHA-256 iterations as this will help foil rainbow attacks.
 
 If you absolutely must use a famous movie line, then salt it with some other meaningful data, like the year you lost your viriginity, e.g.:
 

diff --git a/src/cli.rs b/src/cli.rs
@@ -23,14 +23,8 @@ pub struct Cli {
     #[clap(short, long, help = "Return a 24 word seed phrase [default: 12]")]
     pub long: bool,
 
-    #[clap(short, long, help = "Do not normalize input data")]
-    pub unnormalized: bool,
-
     #[clap(short, long, help = "Output to file")]
     pub output: Option<PathBuf>,
-
-    #[clap(long, help = "Only output the normalized input and quit")]
-    pub normalized_only: bool,
 }
 
 impl Cli {

diff --git a/src/generator.rs b/src/generator.rs
@@ -7,72 +7,15 @@ pub struct Generator {
     data: Vec<u8>,
     iterations: usize,
     long: bool,
-    unnormalized: bool,
 }
 
 impl Generator {
-    /// This is the entry point for the struct. This will normalize the input and create the mnemonic.
+    /// This is the entry point for the struct.
     pub fn seed(&mut self) -> Mnemonic {
-        if self.should_normalize() {
-            self.attempt_normalize();
-        }
         self.hash_iterations();
         bip39::Mnemonic::from_entropy(self.entropy()).unwrap()
     }
 
-    /// Return a reference to internal data.
-    pub fn data(&self) -> &[u8] {
-        &self.data
-    }
-
-    /// Should this generator attempt to normalize the input?
-    fn should_normalize(&self) -> bool {
-        !self.unnormalized
-    }
-
-    /// Remove invalid characters, then remove consecutive spaces ("   " becomes " "),
-    /// then finally trim all whitespace from the ends of the string.
-    pub fn normalize(&self, data: &str) -> String {
-        let mut next_str = String::with_capacity(data.len());
-        let start = self.remove_invalid_chars(data);
-
-        let mut skip_ws = false;
-        for ch in start.chars() {
-            if ch == ' ' && !skip_ws {
-                next_str.push(ch);
-                skip_ws = true;
-            } else if ch != ' ' {
-                next_str.push(ch);
-                skip_ws = false;
-            }
-        }
-
-        next_str.trim().to_string()
-    }
-
-    /// Convert all ASCII characters to lowercase and remove invalid characters.
-    /// Valid characters are [a-z0-9 ].
-    fn remove_invalid_chars(&self, data: &str) -> String {
-        let mut next_str = String::with_capacity(data.len());
-        let start = data.to_ascii_lowercase();
-
-        for ch in start.chars() {
-            if ('a'..'z').contains(&ch) || ('0'..'9').contains(&ch) || ch == ' ' {
-                next_str.push(ch);
-            }
-        }
-
-        next_str
-    }
-
-    /// This will attempt to normalize data. If the data is a valid UTF-8 string, then it will normalize it.
-    /// If it is not valid UTF-8, then it assumes the file is binary and passes it straight through.
-    fn attempt_normalize(&mut self) {
-        if let Ok(string) = std::str::from_utf8(&self.data) {
-            self.data = self.normalize(string).into_bytes().to_vec();
-        }
-    }
-
     /// Returns the entropy needed for genearting the BIP-39 mnemonic.
     fn entropy(&self) -> &[u8] {
         if self.long {
@@ -102,7 +45,6 @@ impl From<Cli> for Generator {
             data: cli.get_input(),
             iterations: cli.iterations,
             long: cli.long,
-            unnormalized: cli.unnormalized,
         }
     }
 }
@@ -117,7 +59,6 @@ mod tests {
                 data: data.into(),
                 iterations: 1,
                 long: false,
-                unnormalized: false,
             }
         }
 
@@ -126,73 +67,25 @@ mod tests {
                 data: data.into(),
                 iterations: 1,
                 long: true,
-                unnormalized: false,
             }
         }
 
-        pub fn normal_input() -> &'static str {
+        pub fn input() -> &'static str {
             "hello world"
         }
-
-        pub fn abnormal_input() -> &'static str {
-            "Hel!lo    wo!RLD!   "
-        }
-    }
-
-    #[test]
-    fn test_remove_invalid_chars() {
-        let gen = util::gen12(" Hel!lo 1    world!  ");
-        assert_eq!(
-            gen.remove_invalid_chars(" Hel!lo 1    world!  "),
-            " hello 1    world  "
-        );
-    }
-
-    #[test]
-    fn test_normalize() {
-        let mut gen = util::gen12("  Hel!lo    1  !    WORLD!!   ");
-        gen.attempt_normalize();
-        assert_eq!(gen.data, b"hello 1 world");
-
-        let mut gen = util::gen12("hello    world   !");
-        gen.attempt_normalize();
-        assert_eq!(gen.data, b"hello world");
-    }
-
-    #[test]
-    fn test_binary_normalization_ignored() {
-        let mut gen = util::gen12("");
-        let data = include_bytes!("../test/junk.dat"); // Set data to binary data
-        gen.data = data.to_vec();
-        gen.attempt_normalize();
-        assert_eq!(gen.data, data);
     }
 
     #[test]
-    fn test_seed_phrase_short_normal() {
+    fn test_short_seed_phrase() {
         let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void";
-        let mut gen = util::gen12(util::normal_input());
-        assert_eq!(gen.seed().to_string(), expected);
-    }
-
-    #[test]
-    fn test_seed_phrase_short_abnormal() {
-        let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void";
-        let mut gen = util::gen12(util::abnormal_input());
-        assert_eq!(gen.seed().to_string(), expected);
-    }
-
-    #[test]
-    fn test_long_seed_phrase_normal() {
-        let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void embark jewel mistake engine liberty innocent captain urban soda jewel dash daring";
-        let mut gen = util::gen24(util::normal_input());
+        let mut gen = util::gen12(util::input());
         assert_eq!(gen.seed().to_string(), expected);
     }
 
     #[test]
-    fn test_long_seed_phrase_abnormal() {
+    fn test_long_seed_phrase() {
         let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void embark jewel mistake engine liberty innocent captain urban soda jewel dash daring";
-        let mut gen = util::gen24(util::abnormal_input());
+        let mut gen = util::gen24(util::input());
         assert_eq!(gen.seed().to_string(), expected);
     }
 }
diff --git a/src/main.rs b/src/main.rs
@@ -10,9 +10,6 @@ fn main() {
     let cli = Cli::parse();
     let mut gen = Generator::from(cli.clone());
 
-    // Check if normalize only is selected, and do that if it is.
-    util::show_only_normalize(&cli, &gen);
-
     let seed = gen.seed();
     cli.write_output(seed.to_string().as_bytes());
 }
diff --git a/src/util.rs b/src/util.rs
@@ -1,20 +1,4 @@
-use crate::{cli::Cli, generator::Generator};
-
 pub fn exit_with_error(msg: &str) -> ! {
     eprintln!("{msg}");
     std::process::exit(1)
 }
-
-pub fn show_only_normalize(cli: &Cli, gen: &Generator) {
-    if cli.normalized_only {
-        let s = std::str::from_utf8(gen.data());
-
-        if let Ok(s) = s {
-            let s = gen.normalize(s);
-            println!("{s}");
-            std::process::exit(0);
-        } else {
-            exit_with_error("Binary data cannot be normalized.");
-        }
-    }
-}