Thing Translator with Google Cloud Vision and Translate API
When you travel to countries that you don’t speak their language, Google Translate has become an invaluable tool. In most cases, you will speak to the app or type in the word to get the translation. Nowadays, with the camera feature in Google Translate, it allows you to essentially point your camera at text written in another language and then get a translation into your native one.
When I was wondering whether I can bring such camera translate feature to the web, I came across this web app “Thing Translator”. It was developed as part of Google’s AI Experiments project, it lets you point your phone (or laptop) at stuff and hear to say it in a different language.
Thing Translator Demo
GitHub repository
You can download the complete code of the above demo in the link below:
Implementation
Behind the scenes Thing Translator is using Google’s Cloud Vision and Translate APIs. In this post, I will explore further into the code, please follow me so that you can also build this cool app yourself.
# Step 1 : Register Google Cloud API
Because this web app is using Google’s Cloud API, to get started, you need to set up a project in Google to enable using 2 of its API.
You might be afraid of Google would charge you for this experiment, please don’t worry, you would be free of charge for experiment:
- Google offers a $300 credit to get started with GCP for free
- The first 1,000 request for Vision API and first 500,000 characters for Translate API is free every month
- You can cap your API usage
Google provided a step by step instruction of how to set this up, more details can be found in Google Cloud API getting started
# Step 2 : Clone and build repository
Next, clone or download the Thing Translator repository from GitHub
git clone https://github.com/dmotz/thing-translator.git
You will need to set your API key in src/config.js
To start a development server on 9966
that will watch for code changes simply run:
npm start
To optimize the output for production, run:
npm run build
Now, you can have fun playing with this camera translation web app, learn how to say some things in another language.
If you are also curious in how it work behind the scenes, please follow me below to dive into the code
Usage of Google Cloud Vision API
In src/effects/snap.js
, you can find the code to call the Google Cloud Vision API in the snap
function. The Vision API can perform LABEL_DETECTION
on a image file by sending the contents of the image file as a base64 encoded string in the body of your request.
xhr.post(
apiUrls.cloudVision,
{
json: {
requests: [
{
image: {
content: state.canvas
.toDataURL('image/jpeg', 1)
.replace('data:image/jpeg;base64,', '')
},
features: {type: 'LABEL_DETECTION', maxResults: 10}
}
]
}
},
(err, res, body) => {
let labels
if (
err ||
!body.responses ||
!body.responses.length ||
!body.responses[0].labelAnnotations
) {
labels = []
} else {
console.log(body)
labels = body.responses[0].labelAnnotations
}
send('translate', labels, done)
setTimeout(send.bind(null, 'endSnap', done), 200)
}
)
If the request is successful, the server returns a 200 OK
HTTP status code and the response in JSON format. A LABEL_DETECTION
response includes the detected labels, their score, topicality, and an opaque label ID:
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/0838f",
"description": "Water",
"score": 0.9388793,
"topicality": 0.9388793
},
{
"mid": "/m/039jq",
"description": "Glass",
"score": 0.85439134,
"topicality": 0.85439134
},
{
"mid": "/m/0dx1j",
"description": "Town",
"score": 0.8481104,
"topicality": 0.8481104
},
{
"mid": "/m/01z8xg",
"description": "Transparent material",
"score": 0.76001805,
"topicality": 0.76001805
},
{
"mid": "/m/0271t",
"description": "Drink",
"score": 0.7353828,
"topicality": 0.7353828
}
]
}
]
}
Usage of Google Cloud Translate API
In src/effects/translate.js
, you can find the code to call the Google Cloud Translate API in the translate
function. Below are the query parameters that we pass:
- q – The input text to translate.
- source – The language of the source text.
- target – The language to use for translation of the input text
xhr.get(
`${apiUrls.translate}&q=${term}&source=en&target=${
langMap[state.activeLang]
}`,
(err, res, body) => {
if (err) {
return failureState()
}
console.log(body)
const translation = he.decode(
JSON.parse(body).data.translations[0].translatedText
)
send('setLabelPair', {label: he.decode(term), translation, guesses}, done)
speak(
translation,
state.activeLang,
speak.bind(null, term, state.targetLang)
)
cache[state.activeLang][term] = translation
}
)
If successful, the response body contains data with the following structure:
{
"data": {
"translations": [
{
"translatedText": "水"
}
]
}
}
Usage of browser Speech Synthesis API
Once it get the translation in text, it utilize the browser’s Speech Synthesis API to convert text to speech, and speak out the translation.
const {speechSynthesis, SpeechSynthesisUtterance} = window
const speechSupport = speechSynthesis && SpeechSynthesisUtterance
const speak = (text, lang, cb) => {
if (!speechSupport) {
cb && cb()
return
}
const msg = new SpeechSynthesisUtterance()
msg.text = text
msg.lang = voices[voiceMap[lang]].lang
msg.voiceURI = voices[voiceMap[lang]].voiceURI
cb && msg.addEventListener('end', cb)
if (text) {
speechSynthesis.speak(msg)
} else {
cb && cb()
}
}
Speech Synthesis API converts normal language text into artificial speech, it is now supported by most of the modern browsers
Conclusion
This is one example of what you can make using Google’s machine learning API’s, without needing to dive into the details of machine learning. Without doubt, Google is one of the leading global companies that advanced in field of Machine Learning and AI. By standing on the shoulders of giants, you can see further into the future.
Thank you for reading. If you like this article, please share on Facebook or Twitter. Let me know in the comment if you have any questions. Follow me on Medium, GitHub and Linkedin. Support me on Ko-fi.