Hadoop is a reasonable tool for cloud computing in big data era and MapReduce paradigm may be a highly
successful programming model for large-scale data-intensive computing application, but the conventional MapReduce
model and Hadoop framework limit themselves to implement jobs within single cluster. Traditional single-cluster Hadoop
may not suitable for situations when data and compute resources are widely distributed this paper focuses on the application
of Hadoop across multiple data centers and clusters. A hierarchical distributed computing architecture of Hadoop is
designed and the virtual Hadoop file system is proposed to provide global data view across multiple data centers. The job
submitted by user can be decomposed automatically into several sub-jobs which are then allocated and executed on corresponding
clusters by location-aware manner. The prototype based on this architecture shows encouraging results.