Skip to content

Commit

Permalink
Merge pull request #1 from ursuscamp/remove-normalization
Browse files Browse the repository at this point in the history
Removing normalization.
  • Loading branch information
ursuscamp authored Sep 6, 2022
2 parents 1f43f4e + 6c14918 commit c065a12
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 164 deletions.
31 changes: 5 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,46 +30,25 @@ BIP-39 seed phrases have become the lingua franca of Bitcoin key management. Alm

After providing a passphrase to the utility it:

1. Normalizes valid UTF-8 input to prevent some input entry errors:
* Converts it to ASCII lowercase.
* Removes invalid characters. Valid characters are `[a-z0-9 ]`.
* Condenses consecutive spaces to one space.
* Removes beginning and trailing spaces.
* For example: `"Hello WORLD!!!!"` becomes `"hello world"`.
2. Hashes it with SHA-256 ten million times.
1. Take some input, typically a passphrase.
2. Hashes it with SHA-256 ten million times (by default).
3. Uses the result as entropy to generate a 12 or 24 word BIP-39 seed phrase.

### Is this not poor security?

Well, humans are relatively predictable, so it won't stand up to brute force attacks like a random seed phrase will. On the other hand, it might also be better opsec to have a passphrase that is hard to forget and only in your head, instead of a random phrase that you have to keep a physical copy just to remember.
Well, humans are relatively predictable, so it won't stand up to brute force attacks like a random seed mnemonic will. On the other hand, it might also be better opsec to have a passphrase that is hard to forget and only in your head, instead of a random phrase that you have to keep a physical copy just to remember.

In a pinch, it may be a good way to flee a hostile area with your wealth intact.

To help with brute force resistance, it uses 10,000,000 iterations of SHA-256, which takes several seconds on my modern MacBook. If you wish to opt for additional security, you can increase the default number of iterations and also turn off normalization.

### What if I wish to use non-Latin alphabet?

I would suggest disabling normalization.

Also, if you want, you don't need to use UTF-8 text. You can can pass in any data, and if it doesn't recognize it as a UTF-8 compatible string, it will skip normalization completely. For example, random data:

```shell
$ cat /dev/urandom | head -c 1024 > junk.dat
$ brainseed -f junk.dat
arch few liar output sadness page lunch much swap much funny pupil
```

You may also want to pass in `-u` to force stop normalization, just in case, by some random chance, the bytes in your binary file happen to form a valid UTF-8 string.

However, this may be less secure than a passphrase since you must store that file somewhere. YMMV
To help with brute force resistance, it uses 10,000,000 iterations of SHA-256, which takes several seconds on my modern MacBook. If you wish to opt for additional security, you can increase the default number of iterations.

### How can I generate a 24 word phrase?

Use the `-l` or `--long` flag to get a 24 word seed phrase.

### What about rainbow tables?

Yup, that's a danger. Use a phrase meaningful to you, not a famous movie line or something. Also consider using a custom number of SHA-256 iterations as this will help foil rainbow attacks.
Yup, that's a danger. Use a phrase meaningful to you, not a famous movie line or something like that. Also consider using a custom number of SHA-256 iterations as this will help foil rainbow attacks.

If you absolutely must use a famous movie line, then salt it with some other meaningful data, like the year you lost your viriginity, e.g.:

Expand Down
6 changes: 0 additions & 6 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,8 @@ pub struct Cli {
#[clap(short, long, help = "Return a 24 word seed phrase [default: 12]")]
pub long: bool,

#[clap(short, long, help = "Do not normalize input data")]
pub unnormalized: bool,

#[clap(short, long, help = "Output to file")]
pub output: Option<PathBuf>,

#[clap(long, help = "Only output the normalized input and quit")]
pub normalized_only: bool,
}

impl Cli {
Expand Down
119 changes: 6 additions & 113 deletions src/generator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,72 +7,15 @@ pub struct Generator {
data: Vec<u8>,
iterations: usize,
long: bool,
unnormalized: bool,
}

impl Generator {
/// This is the entry point for the struct. This will normalize the input and create the mnemonic.
/// This is the entry point for the struct.
pub fn seed(&mut self) -> Mnemonic {
if self.should_normalize() {
self.attempt_normalize();
}
self.hash_iterations();
bip39::Mnemonic::from_entropy(self.entropy()).unwrap()
}

/// Return a reference to internal data.
pub fn data(&self) -> &[u8] {
&self.data
}

/// Should this generator attempt to normalize the input?
fn should_normalize(&self) -> bool {
!self.unnormalized
}

/// Remove invalid characters, then remove consecutive spaces (" " becomes " "),
/// then finally trim all whitespace from the ends of the string.
pub fn normalize(&self, data: &str) -> String {
let mut next_str = String::with_capacity(data.len());
let start = self.remove_invalid_chars(data);

let mut skip_ws = false;
for ch in start.chars() {
if ch == ' ' && !skip_ws {
next_str.push(ch);
skip_ws = true;
} else if ch != ' ' {
next_str.push(ch);
skip_ws = false;
}
}

next_str.trim().to_string()
}

/// Convert all ASCII characters to lowercase and remove invalid characters.
/// Valid characters are [a-z0-9 ].
fn remove_invalid_chars(&self, data: &str) -> String {
let mut next_str = String::with_capacity(data.len());
let start = data.to_ascii_lowercase();

for ch in start.chars() {
if ('a'..'z').contains(&ch) || ('0'..'9').contains(&ch) || ch == ' ' {
next_str.push(ch);
}
}

next_str
}

/// This will attempt to normalize data. If the data is a valid UTF-8 string, then it will normalize it.
/// If it is not valid UTF-8, then it assumes the file is binary and passes it straight through.
fn attempt_normalize(&mut self) {
if let Ok(string) = std::str::from_utf8(&self.data) {
self.data = self.normalize(string).into_bytes().to_vec();
}
}

/// Returns the entropy needed for genearting the BIP-39 mnemonic.
fn entropy(&self) -> &[u8] {
if self.long {
Expand Down Expand Up @@ -102,7 +45,6 @@ impl From<Cli> for Generator {
data: cli.get_input(),
iterations: cli.iterations,
long: cli.long,
unnormalized: cli.unnormalized,
}
}
}
Expand All @@ -117,7 +59,6 @@ mod tests {
data: data.into(),
iterations: 1,
long: false,
unnormalized: false,
}
}

Expand All @@ -126,73 +67,25 @@ mod tests {
data: data.into(),
iterations: 1,
long: true,
unnormalized: false,
}
}

pub fn normal_input() -> &'static str {
pub fn input() -> &'static str {
"hello world"
}

pub fn abnormal_input() -> &'static str {
"Hel!lo wo!RLD! "
}
}

#[test]
fn test_remove_invalid_chars() {
let gen = util::gen12(" Hel!lo 1 world! ");
assert_eq!(
gen.remove_invalid_chars(" Hel!lo 1 world! "),
" hello 1 world "
);
}

#[test]
fn test_normalize() {
let mut gen = util::gen12(" Hel!lo 1 ! WORLD!! ");
gen.attempt_normalize();
assert_eq!(gen.data, b"hello 1 world");

let mut gen = util::gen12("hello world !");
gen.attempt_normalize();
assert_eq!(gen.data, b"hello world");
}

#[test]
fn test_binary_normalization_ignored() {
let mut gen = util::gen12("");
let data = include_bytes!("../test/junk.dat"); // Set data to binary data
gen.data = data.to_vec();
gen.attempt_normalize();
assert_eq!(gen.data, data);
}

#[test]
fn test_seed_phrase_short_normal() {
fn test_short_seed_phrase() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void";
let mut gen = util::gen12(util::normal_input());
assert_eq!(gen.seed().to_string(), expected);
}

#[test]
fn test_seed_phrase_short_abnormal() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void";
let mut gen = util::gen12(util::abnormal_input());
assert_eq!(gen.seed().to_string(), expected);
}

#[test]
fn test_long_seed_phrase_normal() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void embark jewel mistake engine liberty innocent captain urban soda jewel dash daring";
let mut gen = util::gen24(util::normal_input());
let mut gen = util::gen12(util::input());
assert_eq!(gen.seed().to_string(), expected);
}

#[test]
fn test_long_seed_phrase_abnormal() {
fn test_long_seed_phrase() {
let expected = "rich hard unveil charge stadium affair net ski style stadium helmet void embark jewel mistake engine liberty innocent captain urban soda jewel dash daring";
let mut gen = util::gen24(util::abnormal_input());
let mut gen = util::gen24(util::input());
assert_eq!(gen.seed().to_string(), expected);
}
}
3 changes: 0 additions & 3 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ fn main() {
let cli = Cli::parse();
let mut gen = Generator::from(cli.clone());

// Check if normalize only is selected, and do that if it is.
util::show_only_normalize(&cli, &gen);

let seed = gen.seed();
cli.write_output(seed.to_string().as_bytes());
}
16 changes: 0 additions & 16 deletions src/util.rs
Original file line number Diff line number Diff line change
@@ -1,20 +1,4 @@
use crate::{cli::Cli, generator::Generator};

pub fn exit_with_error(msg: &str) -> ! {
eprintln!("{msg}");
std::process::exit(1)
}

pub fn show_only_normalize(cli: &Cli, gen: &Generator) {
if cli.normalized_only {
let s = std::str::from_utf8(gen.data());

if let Ok(s) = s {
let s = gen.normalize(s);
println!("{s}");
std::process::exit(0);
} else {
exit_with_error("Binary data cannot be normalized.");
}
}
}

0 comments on commit c065a12

Please sign in to comment.