Skip to content

Commit

Permalink
Removing normalization. After feedback and further consideration, I d…
Browse files Browse the repository at this point in the history
…ecided that normalization reduced security

without offering much in the way of benefits. It was intended to help reduce mistypings, but's likely that a
user will run this multiple times anyway to confirm it creates a consistent mnemonic.
  • Loading branch information
ursuscamp committed Sep 6, 2022
1 parent 1f43f4e commit 6c14918
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 164 deletions.
31 changes: 5 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,46 +30,25 @@ BIP-39 seed phrases have become the lingua franca of Bitcoin key management. Alm

After providing a passphrase to the utility it:

1. Normalizes valid UTF-8 input to prevent some input entry errors:
* Converts it to ASCII lowercase.
* Removes invalid characters. Valid characters are `[a-z0-9 ]`.
* Condenses consecutive spaces to one space.
* Removes beginning and trailing spaces.
* For example: `"Hello WORLD!!!!"` becomes `"hello world"`.
2. Hashes it with SHA-256 ten million times.
1. Take some input, typically a passphrase.
2. Hashes it with SHA-256 ten million times (by default).
3. Uses the result as entropy to generate a 12 or 24 word BIP-39 seed phrase.

### Is this not poor security?

Well, humans are relatively predictable, so it won't stand up to brute force attacks like a random seed phrase will. On the other hand, it might also be better opsec to have a passphrase that is hard to forget and only in your head, instead of a random phrase that you have to keep a physical copy just to remember.
Well, humans are relatively predictable, so it won't stand up to brute force attacks like a random seed mnemonic will. On the other hand, it might also be better opsec to have a passphrase that is hard to forget and only in your head, instead of a random phrase that you have to keep a physical copy just to remember.

In a pinch, it may be a good way to flee a hostile area with your wealth intact.

To help with brute force resistance, it uses 10,000,000 iterations of SHA-256, which takes several seconds on my modern MacBook. If you wish to opt for additional security, you can increase the default number of iterations and also turn off normalization.

### What if I wish to use non-Latin alphabet?

I would suggest disabling normalization.

Also, if you want, you don't need to use UTF-8 text. You can can pass in any data, and if it doesn't recognize it as a UTF-8 compatible string, it will skip normalization completely. For example, random data:

```shell
$ cat /dev/urandom | head -c 1024 > junk.dat
$ brainseed -f junk.dat
arch few liar output sadness page lunch much swap much funny pupil
```

You may also want to pass in `-u` to force stop normalization, just in case, by some random chance, the bytes in your binary file happen to form a valid UTF-8 string.

However, this may be less secure than a passphrase since you must store that file somewhere. YMMV
To help with brute force resistance, it uses 10,000,000 iterations of SHA-256, which takes several seconds on my modern MacBook. If you wish to opt for additional security, you can increase the default number of iterations.

### How can I generate a 24 word phrase?

Use the `-l` or `--long` flag to get a 24 word seed phrase.

### What about rainbow tables?

Yup, that's a danger. Use a phrase meaningful to you, not a famous movie line or something. Also consider using a custom number of SHA-256 iterations as this will help foil rainbow attacks.
Yup, that's a danger. Use a phrase meaningful to you, not a famous movie line or something like that. Also consider using a custom number of SHA-256 iterations as this will help foil rainbow attacks.

If you absolutely must use a famous movie line, then salt it with some other meaningful data, like the year you lost your viriginity, e.g.:

Expand Down
6 changes: 0 additions & 6 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,8 @@ pub struct Cli {
#[clap(short, long, help = "Return a 24 word seed phrase [default: 12]")]
pub long: bool,

#[clap(short, long, help = "Do not normalize input data")]
pub unnormalized: bool,

#[clap(short, long, help = "Output to file")]
pub output: Option<PathBuf>,

#[clap(long, help = "Only output the normalized input and quit")]
pub normalized_only: bool,
}

impl Cli {
Expand Down
119 changes: 6 additions & 113 deletions src/generator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,72 +7,15 @@ pub struct Generator {
data: Vec<u8>,
iterations: usize,
long: bool,
unnormalized: bool,
}

impl Generator {
/// This is the entry point for the struct. This will normalize the input and create the mnemonic.
/// This is the entry point for the struct.
pub fn seed(&mut self) -> Mnemonic {
if self.should_normalize() {
self.attempt_normalize();
}
self.hash_iterations();
bip39::Mnemonic::from_entropy(self.entropy()).unwrap()
}

/// Return a reference to internal data.
pub fn data(&self) -> &[u8] {
&self.data
}

/// Should this generator attempt to normalize the input?
fn should_normalize(&self) -> bool {
!self.unnormalized
}

/// Remove invalid characters, then remove consecutive spaces (" " becomes " "),
/// then finally trim all whitespace from the ends of the string.
pub fn normalize(&self, data: &str) -> String {
let mut next_str = String::with_capacity(data.len());
let start = self.remove_invalid_chars(data);

let mut skip_ws = false;
for ch in start.chars() {
if ch == ' ' && !skip_ws {
next_str.push(ch);
skip_ws = true;
} else if ch != ' ' {
next_str.push(ch);
skip_ws = false;
}
}

next_str.trim().to_string()
}

/// Convert all ASCII characters to lowercase and remove invalid characters.
/// Valid characters are [a-z0-9 ].
fn remove_invalid_chars(&self, data: &str) -> String {
let mut next_str = String::with_capacity(data.len());
let start = data.to_ascii_lowercase();

for ch in start.chars() {
if ('a'..'z').contains(&ch) || ('0'..'9').contains(&ch) || ch == ' ' {
next_str.push(ch);
}
}

next_str
}

/// This will attempt to normalize data. If the data is a valid UTF-8 string, then it will normalize it.
/// If it is not valid UTF-8, then it assumes the file is binary and passes it straight through.
fn attempt_normalize(&mut self) {
if let Ok(string) = std::str::from_utf8(&self.data) {
self.data = self.normalize(string).into_bytes().to_vec();
}
}

/// Returns the entropy needed for genearting the BIP-39 mnemonic.
fn entropy(&self) -> &[u8] {
if self.long {
Expand Down Expand Up @@ -102,7 +45,6 @@ impl From<Cli> for Generator {
data: cli.get_input(),
iterations: cli.iterations,
long: cli.long,
unnormalized: cli.unnormalized,
}
}
}
Expand All @@ -117,7 +59,6 @@ mod tests {
data: data.into(),
iterations: 1,
long: false,
unnormalized: false,
}
}

Expand All @@ -126,73 +67,25 @@ mod tests {
data: data.into(),
iterations: 1,
long: true,
unnormalized: false,
}
}

pub fn normal_input() -> &'static str {
pub fn input() -> &'static str {
"hello world"
}

pub fn abnormal_input() -> &'static str {
"Hel!lo wo!RLD! "
}
}

#[test]
fn test_remove_invalid_chars() {
let gen = util::gen12(" Hel!lo 1 world! ");
assert_eq!(
gen.remove_invalid_chars(" Hel!lo 1 world! "),
" hello 1 world "
);
}

#[test]
fn test_normalize() {
let mut gen = util::gen12(" Hel!lo 1 ! WORLD!! ");
gen.attempt_normalize();
assert_eq!(gen.data, b"hello 1 world");

let mut gen = util::gen12("hello world !");
gen.attempt_normalize();
assert_eq!(gen.data, b"hello world");
}

#[test]
fn test_binary_normalization_ignored() {
let mut gen = util::gen12("");
let data = include_bytes!("../test/junk.dat"); // Set data to binary data
gen.data = data.to_vec();
gen.attempt_normalize();
assert_eq!(gen.data, data);
}

#[test]
fn test_seed_phrase_short_normal() {
fn test_short_seed_phrase() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void";
let mut gen = util::gen12(util::normal_input());
assert_eq!(gen.seed().to_string(), expected);
}

#[test]
fn test_seed_phrase_short_abnormal() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void";
let mut gen = util::gen12(util::abnormal_input());
assert_eq!(gen.seed().to_string(), expected);
}

#[test]
fn test_long_seed_phrase_normal() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void embark jewel mistake engine liberty innocent captain urban soda jewel dash daring";
let mut gen = util::gen24(util::normal_input());
let mut gen = util::gen12(util::input());
assert_eq!(gen.seed().to_string(), expected);
}

#[test]
fn test_long_seed_phrase_abnormal() {
fn test_long_seed_phrase() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void embark jewel mistake engine liberty innocent captain urban soda jewel dash daring";
let mut gen = util::gen24(util::abnormal_input());
let mut gen = util::gen24(util::input());
assert_eq!(gen.seed().to_string(), expected);
}
}
3 changes: 0 additions & 3 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ fn main() {
let cli = Cli::parse();
let mut gen = Generator::from(cli.clone());

// Check if normalize only is selected, and do that if it is.
util::show_only_normalize(&cli, &gen);

let seed = gen.seed();
cli.write_output(seed.to_string().as_bytes());
}
16 changes: 0 additions & 16 deletions src/util.rs
Original file line number Diff line number Diff line change
@@ -1,20 +1,4 @@
use crate::{cli::Cli, generator::Generator};

pub fn exit_with_error(msg: &str) -> ! {
eprintln!("{msg}");
std::process::exit(1)
}

pub fn show_only_normalize(cli: &Cli, gen: &Generator) {
if cli.normalized_only {
let s = std::str::from_utf8(gen.data());

if let Ok(s) = s {
let s = gen.normalize(s);
println!("{s}");
std::process::exit(0);
} else {
exit_with_error("Binary data cannot be normalized.");
}
}
}

0 comments on commit 6c14918

Please sign in to comment.