Enterprises struggle to store and manage Big Data because it exceeds the capacity of current relational systems and the reason is clear: those legacy systems were designed decades ago, long before Big Data was front and center in the collective imagination. For Telcos, the velocity of data growth and increasing subscribers mean traditional data analytics software will take months to process information which is needed in real-time. Existing tools included operational-type reporting, looking at log files and extracting information from them. The more critical mobile data services have become to the business, the greater the need has become to monitor their contribution. This means being able to track numbers of ‘unique active users’ of its various services, information that had not been easy to come by previously. Today’s relational databases and business intelligence tools are powerful, but what if you have 100, or 1000 times the data?
Enter Hadoop !! No not the toy elephant but a cloud-based, open source platform capable of mining Big Data ona vast scale by harnessing huge arrays of inexpensive computer processing power. Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. It is engineered to spread out that processing across hundreds if not thousands of plain vanilla servers (and eventually, in Google’s vision, millions of machines) arranged in a cluster, rather than relying on super-expensive proprietary machines. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
By turning storage and processing into a commodity, Hadoop allows organizations to be more nimble and agile. SQL-based Hadoop could be used by SQL users, or a new product team could instead use scalding or standard Unix Streaming to achieve results, or a data scientist could use Python libraries that they have come to depend on, all in the same framework and more importantly on the same infrastructure.And what does this all mean for the traditional telecom service provider, not to mention those companies’ hosting/cloud computing groups? Plenty !!! Because not only is Hadoop processing likely to be an important application for cloud providers, telecom operators — sitting on reams of network and customer data — are prime candidates to become Hadoop users.
China Mobile are using Hadoop as a telecom data mining platform, showing how operators can tap this powerful technology to better understand their networks, services and customers, finding new patterns and revelations that can help them compete in the digital future. China is more open to open source than proprietary software because of capabilities that can be gained from an engineering perspective. The government of the Chinese province of Zhejiang is tapping Hadoop to generate about 2.5 petabytes of data each month in the form of video streams from CCTV cameras. With Hadoop, the Zhejiang government was able to solve the big data problem of storing, monitoring, searching ,and analyzing the data in real-time.In India, a mobile advertising firm used Hadoop to help its telco customers deliver relevant real-time information to subscribers. For example, a subscriber who has tried buying a product when they do not have enough prepaid mobile credit will be reminded to make the purchase after they reload credits into their accounts.
Google uses Hadoop (actually its own proprietary version, called MapReduce) to help it swallow the entire Web, not to mention massive map/satellite databases, to produce elegantly useful products such as Google Maps, Google Earth and, of course, the Google search engine itself. Yahoo! uses Hadoop to analyze and optimize how its 20 million visitors consume its home page content.The New York Times set a Hadoop-powered cloud against an 11-million story archive dating back to 1851 to make it instantly searchable.Facebook uses Hadoop to analyze interactions and social graph links on its site — growing at a rate of more than 20 terabytes of new data per day — powering the friend connections and personalization that drives the social networking site.
One of the largest mobile network operators in Europe wanted to leverage the data it captured on mobile usage to achieve a number of specific benefits. Managing advanced data services such as packages that provide mobile Internet access on a range of devices, and applications including mobile email, instant messaging, Google search, news and sports updates, and weather and traffic reports was a monumental and expensive task.Accurate data about real customer activity would help drive changes to its portal, giving users easy access to the applications they use most often. Analysis of service usage enables the company to spot upcoming trends and intelligently market them to customers, and keep its customer touch points current.
After Hadoop implementation the individual business units can report on usage, and provide other KPI information because it allows huge amounts of data to be stored in a granular fashion that is cost effective and performance. With greater and more granular visibility, the company is able to keep its mobile applications and web portals fresh and in line with current customer interests, increase its revenues through increased mobile application sales, and reduce customer churn. The Hadopp solution can consolidate all information requirements in a single environment, and enable reliable, ad hoc analysis and end user self-service. This accelerates the delivery of critical business performance information to the point of need, in a timely enough fashion for that intelligence to be useful and actionable. The Hadoop solution is able to handle large volumes of data, be easily configurable by users, and provide graphics of the results, including the ability to drill down into the detail by way of dashboards.It’s all automatic. Before, users would be sending emails and calls to chase the data.With Hadoop snyone across the whole business can have access to the information they need, and find it on their own.
Big Data in the enterprise should not live in a vacuum. It materializes from dozens of databases, applications, and external sources. It has to be ingested, transformed, enriched, analyzed, and ultimately shared back with the people who will use it to make decisions and serve customers. Hadoop can handle all types of data from disparate systems: structured, unstructured, log files, pictures, audio files, communications records, email– just about anything you can think of, regardless of its native format. Even when different types of data have been stored in unrelated systems, you can dump it all into your Hadoop cluster with no prior need for a schema. With network data volumes on the rise, it is imperative for telecom companies to keep a close watch on their networks to keep them functioning at peak performance, which is the key to retaining customers. Relying on data samples or aggregations isn’t an option. Hadoop is the one super system that can scale to accommodate these data volumes at a reasonable cost.
Sadiq Malik ( Telco Strategist )