September 24, 2019March 8, 2023

Thing Translator with Google Cloud Vision and Translate API

When you travel to countries that you don’t speak their language, Google Translate has become an invaluable tool. In most cases, you will speak to the app or type in the word to get the translation. Nowadays, with the camera feature in Google Translate, it allows you to essentially point your camera at text written in another language and then get a translation into your native one.

When I was wondering whether I can bring such camera translate feature to the web, I came across this web app “Thing Translator”. It was developed as part of Google’s AI Experiments project, it lets you point your phone (or laptop) at stuff and hear to say it in a different language.

Outline hide

1 Thing Translator Demo
2 GitHub repository
3 Implementation
4 Conclusion

Thing Translator Demo

Try the live Demo here

GitHub repository

You can download the complete code of the above demo in the link below:

Thing Translator

Point your camera at things to hear how to say them in a different language – dmotz/thing-translator

Implementation

Behind the scenes Thing Translator is using Google’s Cloud Vision and Translate APIs. In this post, I will explore further into the code, please follow me so that you can also build this cool app yourself.

# Step 1 : Register Google Cloud API

Because this web app is using Google’s Cloud API, to get started, you need to set up a project in Google to enable using 2 of its API.

You might be afraid of Google would charge you for this experiment, please don’t worry, you would be free of charge for experiment:

Google offers a $300 credit to get started with GCP for free
The first 1,000 request for Vision API and first 500,000 characters for Translate API is free every month
You can cap your API usage

Google provided a step by step instruction of how to set this up, more details can be found in Google Cloud API getting started

Copy and save the API Key to be used in the web app

# Step 2 : Clone and build repository

Next, clone or download the Thing Translator repository from GitHub

git clone https://github.com/dmotz/thing-translator.git

You will need to set your API key in src/config.js

To start a development server on 9966 that will watch for code changes simply run:

npm start

To optimize the output for production, run:

npm run build

Now, you can have fun playing with this camera translation web app, learn how to say some things in another language.

If you are also curious in how it work behind the scenes, please follow me below to dive into the code

Usage of Google Cloud Vision API

In src/effects/snap.js, you can find the code to call the Google Cloud Vision API in the snap function. The Vision API can perform LABEL_DETECTION on a image file by sending the contents of the image file as a base64 encoded string in the body of your request.

xhr.post(
	apiUrls.cloudVision,
	{
	  json: {
		requests: [
		  {
			image: {
			  content: state.canvas
				.toDataURL('image/jpeg', 1)
				.replace('data:image/jpeg;base64,', '')
			},
			features: {type: 'LABEL_DETECTION', maxResults: 10}
		  }
		]
	  }
	},
	(err, res, body) => {
	  let labels
	  if (
		err ||
		!body.responses ||
		!body.responses.length ||
		!body.responses[0].labelAnnotations
	  ) {
		labels = []
	  } else {
		console.log(body)
		labels = body.responses[0].labelAnnotations
	  }
	  send('translate', labels, done)
	  setTimeout(send.bind(null, 'endSnap', done), 200)
	}
)

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format. A LABEL_DETECTION response includes the detected labels, their score, topicality, and an opaque label ID:

{
  "responses": [
    {
      "labelAnnotations": [
        {
          "mid": "/m/0838f",
          "description": "Water",
          "score": 0.9388793,
          "topicality": 0.9388793
        },
        {
          "mid": "/m/039jq",
          "description": "Glass",
          "score": 0.85439134,
          "topicality": 0.85439134
        },
        {
          "mid": "/m/0dx1j",
          "description": "Town",
          "score": 0.8481104,
          "topicality": 0.8481104
        },
        {
          "mid": "/m/01z8xg",
          "description": "Transparent material",
          "score": 0.76001805,
          "topicality": 0.76001805
        },
        {
          "mid": "/m/0271t",
          "description": "Drink",
          "score": 0.7353828,
          "topicality": 0.7353828
        }
      ]
    }
  ]
}

Usage of Google Cloud Translate API

In src/effects/translate.js, you can find the code to call the Google Cloud Translate API in the translate function. Below are the query parameters that we pass:

q – The input text to translate.
source – The language of the source text.
target – The language to use for translation of the input text

xhr.get(
	`${apiUrls.translate}&q=${term}&source=en&target=${
	  langMap[state.activeLang]
	}`,
	(err, res, body) => {
	  if (err) {
		return failureState()
	  }
	  console.log(body)
	  const translation = he.decode(
		JSON.parse(body).data.translations[0].translatedText
	  )
	  send('setLabelPair', {label: he.decode(term), translation, guesses}, done)
	  speak(
		translation,
		state.activeLang,
		speak.bind(null, term, state.targetLang)
	  )
	  cache[state.activeLang][term] = translation
	}
)

If successful, the response body contains data with the following structure:

{
  "data": {
    "translations": [
      {
        "translatedText": "水"
      }
    ]
  }
}

Usage of browser Speech Synthesis API

Once it get the translation in text, it utilize the browser’s Speech Synthesis API to convert text to speech, and speak out the translation.

const {speechSynthesis, SpeechSynthesisUtterance} = window
const speechSupport = speechSynthesis && SpeechSynthesisUtterance
const speak = (text, lang, cb) => {
  if (!speechSupport) {
    cb && cb()
    return
  }

  const msg = new SpeechSynthesisUtterance()
  msg.text = text
  msg.lang = voices[voiceMap[lang]].lang
  msg.voiceURI = voices[voiceMap[lang]].voiceURI
  cb && msg.addEventListener('end', cb)

  if (text) {
    speechSynthesis.speak(msg)
  } else {
    cb && cb()
  }
}

Speech Synthesis API converts normal language text into artificial speech, it is now supported by most of the modern browsers

Speech Synthesis API browser support — https://caniuse.com/#search=speech-synthesis

Conclusion

This is one example of what you can make using Google’s machine learning API’s, without needing to dive into the details of machine learning. Without doubt, Google is one of the leading global companies that advanced in field of Machine Learning and AI. By standing on the shoulders of giants, you can see further into the future.

Thank you for reading. If you like this article, please share on Facebook or Twitter. Let me know in the comment if you have any questions. Follow me on Medium, GitHub and Linkedin. Support me on Ko-fi.