Many businesses want to adopt MT, but face a seemingly impenetrable set of barriers when confronted with the cost of MT licenses, knowing which engines are available, understanding ease of customization, and working out how to measure ROI. The recent TAUS Executive Forum in Copenhagen helped shed light on how to breakthrough.
Making machine translation easier Jaap van der Meer opened by summarizing the TAUS vision of overcoming barriers to help the world communicate better with the birth of a thousand MT engines.
Sharing the investment Achim Ruopp of Digital Silk Road followed with a call to action for the translation industry to learn from numerous successful open-source initiatives in other industries. To organize and contribute back into the Moses statistical machine translation (SMT) initiative by filling the gaps left by the academic research community. Moses is by far the most widely used open-source MT engine. This government funded project provides well supported, stable, state-of the art-SMT under the LGPL license. A growing body of use cases prove its viability as a commercial engine. No need for those expensive licenses then? But the free toolkit still misses certain features needed for commercial use. A relatively minor effort would help ensure much broader usage. The graphic below identifies the gaps.
Where to look? It is widely understood that no one MT solution is the best in all scenarios. Engines that specialize on languages pairs and are customized for specific domains tend to shine. But which magic wand is right for me? How do I benchmark which is the right MT option?
Two related TAUS initiatives seek to address these issues. The first, the TAUS Tracker, a directory of MT engines with detailed system overviews will be available on this site within the next few weeks, helping buyers to create shortlists of potential providers.
Results of a pilot project to confirm viability of the second, the MT Trainer & Evaluator, were presented in Copenhagen. Yan Yu gave an overview of the successful TAUS Data Association (TDA) MT Trainer pilot to automate workflow for MT customization using client data and data from TDA.
Adobe, eBay and McAfee were the three prospective buyers seeking trained engines and metrics to measure the quality of output. Languagelens, Pangea MT, and Tilde turned around customized MT engines in 24 hours or less, from which the output was measured for quality (in this pilot) using BLEU scores. The pilot helps to move the industry one step closer to creating a market place to connect buyers and providers, with the added benefit of objective reporting to benchmark quality.
A giant awakens Spyros Pilos explained the European Commission’s MT roadmap, which seeks to implement a best of breed approach for massive demand for multilingual content at the EC. We learned that each EU citizen pays €2 per year for translation and that it would take 8,500 full-time translators per year to make europa.eu fully multilingual.
The EC’s existing rule-based engines were diligently improved from the 1970s to 2006, but are slow and expensive to develop in comparison to data-driven solutions. The coming months will see the EC conduct a giant benchmarking exercise to systematically assess MT engines by language coverage and type of use, whilst considering output quality, total cost of ownership and feasibility.
What to measure? The quality of MT output can be measured by humans or automated metrics. Human evaluation is costly and time consuming, but is useful for reviewing adequacy and fluency right down to the sentence level. Automated metrics are quicker, cheaper and more scalable, but aren’t intuitive or reliably granular. Alon Lavie of Carnegie Mellon University and Safaba ended the session with a breakdown of challenges to creating better metrics to measure MT output quality. The graphic below identifies the gaps.
Unlocking language resources Two years ago TAUS shone a spotlight on a then closed and proprietary industry with its Localization Business Innovation White Paper. Major stakeholders responded with gusto, transforming the industry’s landscape irrevocably. Open standards and openness to connecting are now common practices. The success of Moses and the GlobalSight Initiative prove open-source is a viable business strategy. From the TAUS perspective, the agenda now moves from opening up translations platforms to unlocking the potential of shared language resources. Language data has largely moved from the desktop to the enterprise server, and is now moving to the cloud.
Mega trends Paula Shannon outlined the megatrends of ubiquity and immediacy that motivate the creation of Lionbridge’s Translator Workspace and the partnership with IBM. A cloud computing Software-as-a-Service model and the potential to create customized MT engines using IBM’s technology form the two pillars to service the megatrends. Integration with TAUS Data Association’s super-cloud is planned to be completed by end-July.
Standards, sharing and growth At last year’s TAUS Executive Forum in Edinburgh participants imaginations were sparked by Lingotek’s introduction of social networking dynamics to the business of translation. Their platform also allows users to share translations for reuse in public or private (limited sharing) vaults. At this event Willem Stoeller drew a long breath before listing new partnerships and integrations for the Lingotek Collaborative Translation Platform. The list currently includes SharePoint, Drupal, Alfresco, Social CRM systems (Jive, Lithium), Google, PROMT, Microsoft Bing, Language Weaver, and Moses in partnership Pangea MT. Jeremy Harpham outlined ways in which SDL is open by being involved with setting standards and connecting via APIs. David Filip of Moravia explained that metadata is important for creating ontologies to get the most out of shared language data once these move to the cloud.
Matching in the supercloud So much translation has moved from project-based to simship and is now moving to a near real-time or real-time basis. Quality of connectivity through the supply chain and ease of collaboration are becoming fundamental elements for any translation ecosystem to work efficiently. Smith Yewell spoke on GlobalSight Editions, a planned release of this open-source system that seeks to address these requirements. Explaining the business motivation for sponsoring the development of translation matching in the TDA supercloud, Smith focused on the potential to continue improving efficiency by seeking matches in the supercloud when ‘golden’ translation memories fail to deliver. Matching in the TAUS Data Association supercloud is to go live in October.
What language problem Sergio Pelino made solving the language ‘problem’ look easy with a talk on the Google approach entitled ‘Translation as a utility, making the world’s information universally accessible and useful. Translation and collaboration in the Cloud.’ The world’s biggest language data user is also probably the sexiest innovator in the translation automation space. By virtue of rapidly adding languages to its MT engine, integrating MT across its applications suite, instant search and website translation, combining optical character recognition and MT, and causing disruption with the Translator Toolkit.
Convergence With better and more accessible machine translation and open platforms we begin to see convergence with other functions, and growth opportunities. Global customer support is just such an opportunity identified by TAUS and illuminated by the Consortium for Service Innovation (CSI). Greg Oxton of CSI summarized the evolution of the support function from call centers through to modern day knowledge-centered support, and growing demand for multilingual multimedia support.
A sample of 21 major translation buyers from the IT sector recently indicated their plans for translating support content. Seventy-two percent plan to increase the amount of content that is translated. The graphic below illustrates their preferred approach.
Daniel Grasmick explained the gradual evolution of SAP’s MT in customer support use case using Lucy Software, and its previous incarnations. The latest rule-based installation has been in place since 2004, and with ongoing investment it continues to perform well.
Fred Doyle presented IBM’s multilingual multimedia use case using Knowledge Accelerators solutions. The IBM help library is translated into 11 languages and contains 200,000 one minute task specific tutorials, allowing users to see, hear and read instructions. Multilingual multimedia is used for sales support, implementation training, end user acceptance, and setup and tuning. The result is shorter rollout times through improved training processes and reduced support costs. Fred ended by asking two questions – are your translation tools ready to support multimedia? And why not replace the traditional help file? The graphic below helps to illustrate the trend towards video usage on the web.
TAUS Data Association’s members experience Jaap kicked off the session by outlining TAUS Data Association’s (TDA) Development Roadmap.
Representatives from Adobe, Intel, KCSL, Logrus and Microsoft explained their motivations as members, their experience to date with using data from TDA, and their aims for the future. For all panel members the initial motivation was seeking quality data to get better MT output. Adobe and Intel have experienced serendipity and ROI with TAUS Search alone.
Buyers ultimately want to sell more products and a scalable translation operation through MT supports this, particularly when major growth markets tend to be in non-English speaking locales.
The significant improvement in Microsoft’s MT engine has been well documented. Further gains have been made for second tier languages where Microsoft does not have sufficient data on its own. Adobe expressed the same motivation, adding that a trusted data source is helpful for reducing complexity.
Microsoft has started to look into leveraging TDA data also. Intel’s tests using advanced leveraging on TDA and their own data resulted in better quality translation, but not greater productivity. TDA data is being used by Intel to train Moses engines for comparative purposes.
KCSL’s highly positive experience is also well documented. TDA data helped ensure Logrus has enough data to train its Moses engine on English to Russian. Whilst the diversity of data proved to be a plus for Microsoft, Logrus found this detrimental to quality.
The TDA Development Roadmap is based on member feedback and includes features such as statistical TM cleaning to flag bad translations, and matching scores to help select data at a more granular level and better manage terminological diversity. Detailed feedback from members, such as that from Logrus, is being used to ensure new features are built to service the industry’s evolving needs going forward.
Collective wisdom The final afternoon saw participants report back on group discussions that had taken place throughout the event, highlighting what they saw as the key trends and implications for the language business.
Participants had reviewed the five year horizon on scenarios covering legal/political issues, customer requirements, localization process, business metrics, localization process, and economic issues. This analysis helps to complete the first step in a six stage process using the scenario based planning approach to assess possible future states for the language business.