key-data-analysis

This repository contains an assorted collection of experiments trying to perform the side-channel attack of determining a user's typed text from just an audio signal of their typing.

Some approaches tested involve exploiting patterns in time delays between consecutive key presses, others involve exploiting the frequency spectrum of the sound of each keystroke, and some involve a combination of the two. I also experiment with character-level prediction versus identification of word-delimiting characters and then performing word-level prediction.

I show decent results for clean, noise-free datasets but haven't been able to show good results for in-the-wild data.

Time Delays Between Keystrokes

Considering all possible pairs of consecutive key presses, most key pairs have very similar time delays on average. However, one clear pattern we can observe is that key pairs that use the same finger twice have a slightly larger time delay on average (except possibly when the key pair is the same key twice).

Each cell in the following colormap represents the average time delay in seconds for some finger pair:

I use a Support Vector Machine (SVM) to predict keys based on observed time delay information.

Frequency Spectrum of Sound of KeyStrokes

To find the frequency spectrum of each keystroke's sound, we need to segment each keystroke in the time domain and convert this region of the signal to the frequency domain.

We can apply a single Discrete Fourier Transform (DFT) on the audio samples in the time window representing the entire keystroke.
We can apply the DFT on each of several consecutive smaller time windows that cover the entire keystroke. This produces a spectrogram. It preserves some time-varying information over the duration of a keystroke.

For both of the above methods, I use a Neural Network to predict keys based on the observed frequency spectrum.

Data Collection

All data is collected on QWERTY keyboard layouts for now, but we can likely reproduce the data collection and results for other layouts too.

Keyboards

HP Spectre laptop membrane keyboard
Dell SK-8115 membrane keyboard

Audio Recording

HP Spectre laptop microphone
44.1 kHz, mono

Envisioning Side Channel Attacks

Although the audio was recorded through a laptop microphone for these experiments, a hypothetical attack would occur through a separate recording device than the one receiving typed input:

Recording audio through smartphone of people typing in public spaces, like offices or libraries.

Smart home devices passively recording audio may identify what you type on your laptop.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.idea		.idea
assets		assets
features		features
plots		plots
results		results
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

key-data-analysis

Time Delays Between Keystrokes

Frequency Spectrum of Sound of KeyStrokes

Data Collection

Keyboards

Audio Recording

Envisioning Side Channel Attacks

About

Releases

Packages

Languages

abx393/key-data-analysis

Folders and files

Latest commit

History

Repository files navigation

key-data-analysis

Time Delays Between Keystrokes

Frequency Spectrum of Sound of KeyStrokes

Data Collection

Keyboards

Audio Recording

Envisioning Side Channel Attacks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages