Tuesday , 16 July 2019
Breaking News

Yarn Architecture/Job Submission Procedure

YARN Architecture

=> https://www.ibm.com/developerworks/library/bd-yarn-intro/

Image shows architecture of YARN

Key point ResourceManager takes care of the ApplicationMasters, while the ApplicationMasters takes care of tasks.

Application Master One per YARN application
– Runs in a container
– Framework/application specific
– Communicates with the ResourceManager scheduler to request containers to run application tasks
– Ensures NodeManager(s) complete tasks
ResourceManager One per cluster– Initiates application start-up
– Schedules resource usage on worker nodes– Manages nodes
     – Tracks heartbeats from NodeManagers
– Runs a scheduler
     – Determines how resources are allocated
– Manages containers
     – Handles ApplicationMasters’ requests for resources
     – Deallocates containers when they expire or when the application completes
– Manages ApplicationMasters
     – Creates a container for ApplicationMasters and tracks heartbeats
– Manages cluster-level security
Containers Allocated by the ResourceManager
– Require a certain amount of resources (memory, CPU) on a worker node
– YARN Applications run in one or more containers
NodeManager One per worker node– Starts application processes
– Manages resources on worker nodes– Communicate with the ResourceManager
     – Register and provide info on node resources
     – Send heartbeats and container status
– Manage processes in containers
     – Launch ApplicationMasters on request from the ResourceManager
     – Launch processes into containers on request from ApplicationMasters
     – Monitor resource usage by containers; kill runaway processes
– Provide logging services to applications
     – Aggregate logs for an application and save them to HDFS
– Run auxiliary services
– Maintain node level security

Application submission in YARN

Application submission in YARN 1) Users/Clients submit applications to the ResourceManager by running below command – $ hadoop jar . 

 2) ResourceManager

     – Accepts a new application submission & determines which application should get cluster resources next.

     – Decision is based on many constraints, such as queue capacity, ACLs, and fairness.

 3) ResourceManager uses a pluggable Scheduler

      – Scheduler selects a container in which ApplicationMaster will run.

      – The Scheduler focuses only on scheduling; it manages who gets cluster resources (in the form of containers) and when.

 4) Once ApplicationMaster(AM) is started, it will be responsible for a whole life cycle of this application.

 5) ApplicationMaster sends resource requests to the ResourceManager to ask for containers needed to run an application’s tasks. 

       A resource request is simply a request for a number of containers that satisfies some resource requirements, such as:
           - An amount of resources, today expressed as megabytes of memory and CPU shares
           - A preferred location, specified by hostname, rackname, or * to indicate no preference.
           - A priority within this application, and not across multiple applications

 6) ResourceManager grants a container (expressed as container ID and hostname) that satisfies the requirements requested by the ApplicationMaster in the resource request.

           – A container allows an application to use a given amount of resources on a specific host.

 7) After a container is granted, the ApplicationMaster will ask the NodeManager to use these resources to launch an application-specific task. This task can be any process written in any framework (such as a MapReduce task or a Giraph task).

 8) The NodeManager – only monitors the resource usage in the containers and does not monitor tasks; for example, it kills a container if it consumes more memory than initially allocated.

 9) The ApplicationMaster 
           - Spends its whole life negotiating containers to launch all of the tasks needed to complete its application.
           - Also monitors the progress of an application and its tasks.
           - Restarts failed tasks in newly requested containers, and reports progress back to the client that submitted the application.
After the application is complete, the ApplicationMaster shuts itself down and releases its own container.

10) ResourceManager  - Checks the health of the ApplicationMasters but does not perform any application task monitoring.  
                                 - ResourceManager restarts ApplicationMaster in a new container If the ApplicationMaster fails.

Check Also

Maecenas mattis, tortor ut posuere aliquam

Diam wisi quam lorem vestibulum nec nibh, sollicitudin volutpat at libero litora, non adipiscing. Nulla …

Leave a Reply