Using data science to recreate a simple Soundwave Tattoo app
If you haven’t heard, there’s a new wave of tattoos (pun intended). Traditionally, you go to your tattoo artist, collaborate or dictate an image you want on your skin, and he/she obliges (or not). There’s nothing more to it than a creative, or not so creative, image. Skin Motion, founded in 2017 in California, is a company that provides an opportunity to now keep audio in a form of a tattoo. Check out their example video on their homepage, it’s mind blowing!
As a data science student, I immediately tried looking for their code to see how they commit such sorcery. After 5 minutes of not finding anything in my search, I read on their site, “With our unique patent pending technology, turn up to 30 seconds of sound into a tattoo you can listen to with your mobile device.” Clearly I’m not going to be able to see under-the-hood how this works…so I decided to try and recreate it. I won’t go into details of my emotional process during this project (it was a rollercoaster). Rather, I’ll explain step-by-step how I recreated this awesomeness.
Disclaimer: I am not attempting to fully recreate their technology, my recreation serves simply as an insight as to how it might work.
How it works
On Skin Motion’s FAQ page they explain how it works and answered some questions I initially had. If you’re thinking of getting one, pricing and other info can be found on that page.
- Create account
- Upload an audio file (less than 30 sec, as stated on their homepage)
- Skin Motion generates a shape of a Soundwave from that audio file
“A combination of audio processing, image recognition, computer vision, and cloud computing to create a mixed reality experience.”
My initial thought is that their technology is actually reading the wave, i.e. can get a function of that wave. But after discussing with my colleagues, and many hours in research, I concluded they generate an image. This image is then stored as a tag in their database, along with the corresponding audio.
4. Customize (to an extent) the generated Soundwave and approve final design
5. Get the design tattooed from one of their Soundwave verified tattoo artists
6. Use the app to hear the audio that’s on your skin
Process of Recreation
- Get audio and transform it so it has wav extension
- Visualize audio as an image
- Get image signature and store this, other parameters, and the audio (in bytes) into database
- Get tattoo (or just print the wave)
- Take picture of that tattoo (I didn’t incorporate object detection, tattoo image must be stationary [png file])
- Get image signature of the picture and query the database for the image signatures of stored audios
- Find the match — the image signature that has shortest distance with picture image signature
- Play audio of that match
Link to my GitHub repo that contains scripts and sample audios used for creating the database can be found here. You can fork the repo, delete the database and start from scratch to see the magic for yourself! Be sure to start with the seed file if you’re adding to/constructing the database and follow the functions in order. Then move onto the query file to find the match! In between, print a wave you want to sample (i.e. get a tattoo of), take a picture, and try and find the match with the query file.
You will need to ensure the audio file is in .wav format and images you store and test are png format. I used voice memos on my iPhone to record samples of my classmates. These files are saved as m4a; to convert, type in terminal:
ffmpeg -i input.m4a output.wav
Above is a picture (png format) I took of a wave as I was entering sample audios into the database. I used this image to query the database and it returned the corresponding audio!
How finding a match works:
First, image signatures are created for each image, and is stored in the database along with its corresponding audio and audio parameters. For information on how image signatures are generated, refer to this paper. The tattoo image you’re trying to find a match gets an image signature as well. Note: image signatures are just an array of integers. With ImageSignature from image_match.goldberg, you can find the normalized distance between two image signature.
|| b — a || / ( ||b|| + ||a||)
When querying the database, the distance is calculated for each stored image against the testing/tattoo image. The image that returns the shortest distance, is the most likely match and plays back the audio. The pitfall to this method is that someone can take a picture of any wave tattoo not stored in the database, query the database, and will still get an audio played back. This is because the query looks for shortest distance with the testing image. In order to fix this, you can set a threshold. From my readings, a distance value of 0.4 or lower means images are most likely a match. However, when finding distances of images that should match, I would get values from 0.4–0.7. This is probably where extra computer vision and image processing would come into play. As aforementioned, I’m not trying to fully recreate the technology, just get an understanding as to how it might work. If you’re interested in testing this project for yourself, checkout my GitHub repo!