Exploring the Intersection of Big Data and Machine Translation
- 11 March 2013
- Written by Rex Martin Jr.
The words “Big Data” are two of the most interesting, and at times misunderstood, in our global digital universe. They have unique meanings and context to different governments, academics, companies, technologies, and our social experiences. The one undisputed fact, regardless of their use, is that our digital universe is expected to reach 40 Zettabytes of data by 2020 with an estimated 2.8 ZB created in 2012. That is 14% beyond previous forecasts. To put this into real-world context, 40 ZB is equal to 57 times the number of all the grains on all the beaches on the earth.
Geographically, the location of the world’s data is set to undergo a shift. Currently, emerging markets account for 36% of the world’s data, but that is set to increase to 62% by 2020. The current global breakdown is: US – 32%, Western Europe – 19%, China – 13%, India – 4%, rest of the world 32%. By 2020 IDC estimates nearly 40% of the data will be stored or processed in a cloud between the byte’s origination and consumption.
No debate, the numbers and estimates are “big”!
Just like the “90s Internet boom”, industries have been struggling to identify their place in the “Big Data” frontier. In parallel, technology companies are jockeying for dominance of the 2016 estimated $24 billion dollar market and have created a supernova of “Big Data” tools and solutions that are energizing and illuminating the “Big Data” mantra.
As the transformation continues to take place, the line between “Big Data”, analytics, applications, and data intensive computing are blurring. Companies are also realizing that the “little” data environments of today are the big data environments of tomorrow. Add in multilingual emerging markets, the need for contextual application of data in those markets, and the necessity to bridge the language divide in a borderless digital universe… it represents “big” possibilities for innovation and how we work and live with data.
As of February 2013 the term “big language” has been coined to describe the intersection of many languages and “big data’s” velocity, volume, variety, and value challenges. Several companies are now offering services for automatic email translation. One email, small data, but add in the volume and velocity of email usage in our digital universe and translation now becomes a “big data” play. The industry is quickly starting to see that “Big Data” isn’t just about analytics.
During the upcoming TAUS Executive Forum in June, we are proposing a panel discussion on the use of big data and how it might advance the machine translation industry. Does the ability to store, retrieve, process, and analyze exabytes of data provide more details on language syntax and structure? Can big data help us train engines on obscure languages more quickly? Will big data accelerate machine translation into new vertical markets? We hope to explore these and other questions with a deeper understanding of big data capabilities.
Guest author: Rex Martin Jr.
Rex Martin Jr is a Services Architect at Oracle.