Transferring Files Over Sound

Recently, I’m stuck with a computer with no output other than the monitor and audio. I needed to get some files from it and using sound waves for file transfer was the only solution.


The method I used was binary frequency shift keying (BFSK) which is used on some modems. It basically encodes digital data into analog signal like sound wave.
The algorithm is simple:
1) Convert data into binary format
2) For all bits in the data:
a) Produce analog signal of short length with frequency F if the bit is 0
b) Produce analog signal of short length with frequency Q, which is different from F, if the bit is 1
After that, the modulated analog signal can be transmitted and demodulated at the receiving end looking at frequency of the signal over time.

The program generates and reads .WAV files. You can find information about format of the .WAV files here, here, here and more on Google.

I think the important parts worth mentioning are generating modulated signal from binary data and extracting binary data from the modulated signal.

As you can understand, the function named “generateBFSKedSineWave” undertakes the task of generating modulated analog signal. It simply loops through each byte of given data, shreds the current byte to its bits, generates a sine wave at mark frequency if the current bit is 1 a sine wave at space frequency if the current bit is 0 and silence for some time. Silence is necessary for the waves which have different frequencies to be easily distinguished regardless of noise after transmission. Mark and space frequencies also should be chosen in a way that they can easily be distinguished so, the difference between mark and space frequencies is big. Additionally, sample count of the signal generated
for each bit should be a power of two. Although this isn’t necessary for GOERTZEL algorithm, for some reason I couldn’t understand why, my program gives wrong results otherwise. Also note that waves at mark frequency oscillate two times of waves at space frequency. In this way each wave has the same length. This makes it easy to demodulate the sound after transmission. If this would not the case, we would parse the sound by silence and analyse each wave but parsing the wave by silence is harder than using time length.

The function “extractDataFromBFSKedSineWave” demodulates the modulated signal. Basically, it starts from a given start point, takes a number of samples, analyses these samples using GOERTZEL algorithm, writes a 0 if the samples constitute a signal at space frequency 1 if the samples constitute a signal at mark frequency, and jumps to next start point. The reason for using sample offset, which states the difference between starting samples of two consecutive waves, is that the start and end points of individual waves, and the whole wave, change during transmission because of noise and start time of the recording. The sample offset is set manually by looking at the modulated wave in an audio editing software and roughly estimating the number of samples between each wave.

Here is an example to clarify how the process goes. Prepare a sample text file, preferably a small one because transmission of large files contains lots of errors. Run the program. Choose modulation, write the name of the file which will be encoded, write the name of the WAV file to be generated and enter a character to exit. Connect your headset to the PC. Get your cellphone or a recording device. Play the generated WAV file on the PC while recording to the phone through the headset. The program you used on the cellphone for recording should record the sound in WAV file format. The purpose of using headset is to reduce effect of the background noise. After recording, send the recorded sound file from your cellphone to a PC on which this program and an audio editing software is installed, I used Audacity for editing. Open received WAV file in the audio editing software. Delete silent parts from the beginning and from the end which is caused by the time difference
between you push the play button on the PC and record button on the cellphone. After deleting these parts, sound wave should look like the image below.



As you can see, the audio wave contains parts of low frequency and high frequency. Low frequency is space frequency and high frequency is mark frequency. Mark and space frequencies don’t have to be exactly the same as the frequencies we used for generating the WAV file in the first place. It is enough if we can see the difference by looking at the wave. Also notice the silence between each part. Now, we should determine the sample offset. Sample offset states the sample difference between the start samples of two consecutive waves to be analysed. To be more specific, after analysing a wave part, the program jumps samples from the starting sample of current wave part as much as sample offset and takes the this position as the starting sample of the next wave part to be analysed. As can be seen from the image below, sample offset can be determined using an audio editing software.


Unfortunately, the program is extremely sensitive to the sample offset. If you change it even by one, resulting demodulated file will probably be totally different than expected. So, if you get non-sense results try changing the sample offset to see if this helps. One way to overcome this problem is parsing the audio by silence waves. Although determining the beginning and the end of the silence waves is difficult, it should not be impossibly hard.

Leave a Reply

Your email address will not be published. Required fields are marked *