A new tool for an old question - "Which Online Translation Engine Works Best?" (Machine Translation (MT))

Technical forums » Machine Translation (MT) »
A new tool for an old question - "Which Online Translation Engine Works Best?"
Track this topic

Off topic: A new tool for an old question - "Which Online Translation Engine Works Best?"

Thread poster: Gabble On (X)

Gabble On (X)
Local time: 00:39

Feb 16, 2010

I've built a simple tool to help answer an old question of mine: “Which engine translates best?” You can find it here: www.gabble-on.com

Throughout three years of high school Spanish and three years of college Chinese, I used translation
sites like BabelFish a lot. They were far from perfect and I always rotated between several sites to try to find the best tool for any given situation.

10 years later, my question still hasn’t been answered. So I’ve put together this open research project to
allow anyone who speaks two languages to type any phrase into our dynamic engine, compare the results of multiple translation engines side by side, and vote on the best.

I appreciate your help and I hope you’re curious too. My goal is to collect 10,000 votes over the 6 weeks
between February 15th and March 29th, analyze the data, and publish the results.

As a thank you for everyone's participation in this project, I'm also holding a fun little March Madness contest with an iPad as a giveaway to one lucky winner. Come check it out! www.gabble-on.com

- Ethan
[email protected] ▲ Collapse

erikl
English to Finnish

An approach to calculate the best Online Translation Engine

Mar 23, 2010

Hello Ethan,

I cast my vote already with English to Spanish, Finnish, and Swedish.
I look forward in seeing the results.

FYI: We have experimented in finding an automated way of finding the best translation. because of practical reasons we used the same set of Online Translation Engines as you do.

If you're interested in our approach please check the following paper
http://help.multilizer.com/documents/research/MultiMT%20Technology%20Overview.pdf
or drop me an email.

Best regards,

Erik Lindberg
[email protected]

[Edited at 2010-03-23 14:12 GMT] ▲ Collapse

Kirti Vashee

United States
Local time: 21:39

A deeply flawed approach?

Apr 17, 2010

It is not clear what will really be proved by this experiment.

Several professionals have commented on the shortcomings of this approach and for those who care to see it in detail you can read this in a blog entry at:
http://kv-emptypages.blogspot.com/2010/03/ongoing-quest-for-best-mt-translation.html

My personal sense is that this a pretty meaningless exercise unless one has some upfront clarity on why you are doing this. It depends on what you measure, how you measure, for what objective and when you measure. On any given day, any one of these engines could be the best for what you specifically want to translate. Measuring random snippet translations on baseline capabilities will only provide the crudest measure that may or may not be useful to a casual internet user but completely useless to understanding the possibilities that exist for professional enterprise use where you hopefully have a much more directed purpose. In the professional context knowledge about customization strategies and key control parameters are much more important. The more important question for the professional is: Can I make it do what I want relatively well and relatively easily?

This is another criticism by Alon Lavie who is a professor of computational statistics at CMU:

Ethan Shen of Gabble On “is hoping to be able to detect predictive patterns in the data that he could use to predict future engine performance. But he has no control over the input data (participants choose to translate anything they want), and he's collecting just about no real extrinsic information about the data. So beyond very basic things such as language-pair and length of source, he's unlikely to find any characteristics that are predictive of any future performance with any certainty whatsoever.

What can be done (but Ethan is not doing) is to use intrinsic properties of the MT translations themselves (for example, word and sequence agreement between the MT translations) to identify the better translation. In MT research, that's called "hypothesis selection". My students and I work extensively on a more ambitious problem than that - we do MT system combination, where we attempt to create a new and improved translation by combining pieces from the various original MT translations. Rather than select which translation is best, we leverage all of them. We have had some significant success with this. At the NIST 2009 evaluation, we (and others working on this) were able to get improvements of about six BLEU points beyond the best MT system for Arabic-to-English. That was about a 10% relative improvement. That was a particularly effective setting. Strong but diverse MT engines that each produce good but different translations are the best input to system combination.” ▲ Collapse

Kirti Vashee

United States
Local time: 21:39

Clarification on post above

Apr 19, 2010

I meant to say that Alon Lavie is a professor of Computational Linguistics (NLP) above.

[Edited at 2010-04-19 21:11 GMT]

Neil Coffey

United Kingdom
Local time: 05:39
French to English
+ ...

Flaws

Apr 20, 2010

Kirti Vashee wrote:
My personal sense is that this a pretty meaningless exercise unless one has some upfront clarity on why you are doing this.

Well, actually they do present a list of hypotheses on the site. But in a sense, that's a flaw-- when you conduct an experiment and don't want your subjects to bias your results, you don't usually tell your subjects in advance what your hypothesis is...

Of course, every experiment has flaws, and you have to weigh up the difficulty of removing these flaws vs practical constraints. Some other problems in this case are:

- they say they want 10,000 votes -- but votes of *what*? is this per language pair? what methodology have they used to estimate that this will be enough to get statistically significant results in the language pair with the likely lowest number of votes?
- how will they assess and compensate for natural biases in the type of people taking part in the experimnt? (e.g. the site is in English and located in the US, so more people are likely to find the site in a US search engine configured for English; an MT system trained/designed more for US English will then be inherently likely to fair better)

Kirti Vashee wrote:
This is another criticism by Alon Lavie who is a professor of computational statistics at CMU:
Ethan Shen of Gabble On “is hoping to be able to detect predictive patterns in the data that he could use to predict future engine performance. But he has no control over the input data (participants choose to translate anything they want)

Though they do have control over how they *filter* the data they get-- e.g. they can say "we'll only include input between X and Y words in length"-- and this can be a viable approach if done properly. But they obviously need to be careful not to bias their results by "peeking" (e.g. they have to make decisions about how to filter using a sample of the data that is then removed from the data actually analysed, and the decision of which sentences are used for experimental design and which are actually analysed should be random).

and he's collecting just about no real extrinsic information about the data

Yes, that's a potential problem, though arguably one that can be overcome by collecting a large amount of data. (OTOH, I'm not sure that 10,000 sentences is large enough.)

So beyond very basic things such as language-pair and length of source, he's unlikely to find any characteristics that are predictive of any future performance with any certainty whatsoever.

Arguably true, but if you look at their actual list of hypotheses, they probably *are* collecting enough in principle for those specific hypotheses. (Whether the testing of those hypotheses tell us much about future performance of MT, I'm not sure...)

What can be done (but Ethan is not doing) is to use intrinsic properties of the MT translations themselves (for example, word and sequence agreement between the MT translations) to identify the better translation.

I think this definitely has some advantages in terms of experimental design (your measurements are more "objective"; you can effectively run any text through the system "instantaneously", so you can run arbitrary numbers of sentences/sentences from well-defined sources). What I'd be interested to know is how you then remove the problem of circularity from your results-- in other words, if your experiment shows that Google Translate comes out top, how do you know that this result isn't biased by Google Translate using similar measures in their (essentially unpublished) training process to the ones that you're using in your evaluation?

Gabble On (X)
Local time: 00:39

TOPIC STARTER

www.gabble-on.com - Results of my Google Translate vs. Bing Translator vs. Yahoo Babelfish

May 3, 2010

I appreciate all of the input and comments that have been given in this forum, especially those very well supported ones from Alon Lavie.

I do agree that there are some flaws in the experimental design and assumptions, however I think the results still can provide some interesting insight.

We've found that while Google Translate is widely preferred when translating long passages, Microsoft Bing Translator and Yahoo Babelfish often produce better translations for phrases below 140 characters. Also, in general Babelfish performs well in East Asian Languages such as Chinese and Korean and Bing Translator performs well in Spanish, German, and Italian.

You can read the full results here - http://gabble-on.com/compare-translators/Phase1-research

This project is only the first in a series. Many of the comments you have given have been incorporated in the design of our current Phase 2 research which focuses on short phrases and will constrain some of the user's input options.

I hope you all will continue to take interest in my work and share it with your friends.

- Ethan ▲ Collapse

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Mahmoud Akbari	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

A new tool for an old question - "Which Online Translation Engine Works Best?"

Forum rules

Help and orientation

Wordfast Pro
Translation Memory Software for Any Platform Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value Buy now! »

Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business. More info »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

A new tool for an old question - "Which Online Translation Engine Works Best?"

A new tool for an old question - "Which Online Translation Engine Works Best?"

You have native languages that can be verified

Your current localization setting

Select a language