Wednesday, June 14, 2017

Dispatcher (AEM)

Dispatcher:
1)Load balancing
2)Caching
3)Security  - Filtering

———————————————————————————————————————————————————
Dispatcher is a module on web server. 
Microsoft iis,Apache web server.
config - Dispatcher .any
for every web server there will be config file
for apache:
httpd —  httpd.conf
——————————————————————————————————————————————————
should dispatcher set infront of a author or should dispatcher set infront of a publish??
we can set infront of both of the instances.
——————————————————————————————————————————————————
why would you setup the dispatcher infront of the author instances??
Dispatcher is used,mainly for caching.
for example:
when two author instances are redirected to one load balancer.
the load balancer part will be taken care by the dispatcher,(it performs the round robin algorithm will run  in the dispatcher module).
————————————————————————————————————————————————————
Setup References:
Dispatcher.any  file 
Dispatcher specific configuration entries:
 The Dispatcher-specific configuration entries are placed after the LoadModule entry .The following table lists an example configuration that is applicable for both Unix and Windows.
In the setup:
 1) Install Apache webserver  2.2 or 2.4 and view the Apache home page (localhost:8080/  )
  2)How do we download dispatcher module??
    Dispatcher module  -  Adobe package share  - Dispatcher.dll 
3)Place the dispatcher.dll in the modules folder of  your webserver .
we need to give the path of dll location.
How do debug the dispatcher?
I can go and debug in the dispatcher.any
4)let us understand the config of dispatcher
to Any webserver you need to download the dispatcher. 
[ extra :httpd.conf - Load the Dispatcher module and specify the settings :
these are the lines to be added in the httpd:
<ifModule disp_apache2.c>
DispatcherConfig conf/dispatcher.any
DispatcherLog logs/dispatcher.log
DispatcherLogLevel 3
DispatcherNoServerHeader 0
DispatcherDeclineRoot 0
DispatcherUseProcessedURL 0
DispatcherPassError 0
DispatcherKeepAliveTimeout 60
</ifModule>
]
Extra points to be noted: 
DispatcherNoServerHeader
Defines the Server Header to her used:
undefined or 0 - HTTP server header contains the AEM version
1 - the apache server header is used.
DispatcherDeclineRoot
Defines whether  to decline requests to the root “/“:
0 - accept request to /
1 - request to / are not handled by the dispatcher, use mod_alias for the  correct mapping.
example:
http://localhost:8080/  —— it accepts root 
or 
http://localhost:8080/content  —— it accepts  only search request.
that is depends upon the file DispatcherDeclineRoot.
example:
as explained  dispatcher sits infront of publish.
In publisher,
here, whatever i specify, i can specify the same thing webserver to access this entire thing
Here,in the above examples, I am not exposing my server, this is the main purpose of the dispatcher.

DispatcherPassError:
Defines how to support 40% error codes for ErrorDocument  handling:
0 - the dispatcher spools all error responses to the client
1 - the dispatcher does not spool an error response to the client (where the status code is  greater than or equal to 400)but passes the status code to Apache,which e.g allows an  ErrorDocument directive to  process such a status code.

learn about the rewrite rules
——————————————————————————————————————————————————
Dispatcher.any
References:

By default the dispatcher configuration is stored in the dispatcher.any  text file.though you can change the name and location of this file during the installation. 
————————————————————————————————————————————————————
what is meant by farm??
The /farm property defines one or more sets of Dispatcher behaviors,where each set is associated with different web sites or URLs.The /farms property can include a Single farm  or multiple farms.

e.g.:
there are  two sub-domains :
honda.com.au  —> this entire thing will have one farm
civic.honda.com.au  ——> this entire thing will have one farm
that is multiple websites will be defined under one farm.
———————————————————————————————————————————————————
Identifying virtual hosts - /virtualhosts 
The /virtualhosts property defines a list of  all hostname/URI combinations that Dispatcher accepts for this farm.
You can use the asterisk (“*”) character as a wildcard.Valuses for the  /virtualhosts property use the following format:
[scheme]host[uri][*]

scheme:(optional)Either http:// or https://
host:The name or ip address of the host computer  and the port number if necessary.
uri:(optional) The path to the resources.

the following example configuration handles  requests for the .com and .ch  domains of my company  and all domains of related sub-division.

/virtualhosts
{
“www.mysubdivision.*”
}
The following configuration handles all requests:
/virtualhosts
{
“*”
}
The best way to do is mention every domain in the virtual host.
——————————————————————————————————————————————————————

By the above we understand two things:
1)one is farm is a multivalue 
2)another one is virtual host  will tell what are request farm is going to take.
Section1:
Specifying  the HTTP Headers to Pass through  - /clientheaders
The  /clientheaders property  defines a list of HTTP headers  that Dispatcher passes from the client HTTP request to the rendered (AEM INSTANCE).
By default Dispatcher forwards the standard HTTP  header to the AEM instance.In some instances, you might want forward additional headers or remove specific headers.
Add headers, such as custom headers that your AEM instances expects in the HTTP request 
Remove headers such as authentication headers that are only relevant to the webserver.

If you customize the set of headers to pass through.You must specify  an exhaustive list of headers.Including those that are normally included by default.
For examples, a dispatcher instance that handles page activation request  for publish instances  requires the  PATH  header in the  /clientheader  section.The  PATH  header enables communication between the replication agent and the dispatcher.
the following code is an example configuration for /clientheaders.
/clientheaders 
{
“referer”
“user-agent”
“authorization”
[3)the third thing, when  I am passing /when i am firing some http request that is coming ,
that means your dispatcher has to pick some http requests 
as already mentioned dispatcher can sit infront of the author/publish instances, which means whatever in the request header,whether you want to pass to your header or not ,that means what client header you want to pass through].

—————————————————————————————————————————————————
section2:
The load will be balanced among these render instances :
/renders
{ (this indicates the multi valued pair)
/rend01
{
#Hostname or IP of the render
/hostname “localhost”
#Port of the  render
/port “4503”
#Connect timeout in milliseconds, 0 to wait indefinitely 
# /timeout “5000” 
}
}
we can write the rend02,the load balanced amount theses two render instances.
for suppose if i have the 10 publish instances,i need to mention the rend01…………..rend10 like that so on.
then i need to mention the 10 servers.
—————————————————————————————————————————————————————
Section3:
Configuring Access to content -/filter

Use the /filter section to specify the HTTP request  that Dispatcher accepts.All other request are sent back to the web server with a 404 error code  (page  not found) .If no filter section exists, all requests are accepted.

Defining a filter

Each item in the /filter section includes a type and a pattern that is matched with a specific element of the request line or the entire request line.Each filter can contain following items:

Type: The /type indicates whether to allow or  deny access for the request that match the pattern.The value can  be either allow or deny

Element of the request line: include /method, /url , /query  or /protocol  and a pattern for filtering requests  according to these specific parts  of the  request-line part of the HTTP request.Filtering on elements of the requests line (rather than on the entire request line) is the prefered  filter method.

glob property:The /glob property is used to match  with the entire request line of the HTTP request .

whatever that is there after the glob is nothing but the pattern.
glob means, it will look through the pattern.
what path you have to look from the uri that is the purpose of the glob
so i always look through the entire uri,so i always add “*” to that pattern.
1)sometimes in the url pattern we will add “Get”keyword  (this is nothing but the content grabbing)
——————————————————————————————————————————————————————
section4:
Cache section:
what responses is cached and  where it is cached.
they are the properties:
/docroot
/statfile:how deeper you need to cache the file.it gets created with every file,it is a timestamp that get’s created 
for example:
Timestamp 2  - /content/geometrixx/en,apart from caching, there should be flushing also should happen.
there is something called Dispatcher flush :when we have new content the dispatcher flush should be happened.
how this works??
it is going to look at Timestamp.
when i activated a page, dispatcher flush will look at the timestamp (published  date)
if the Published date > stat file timestamp  —flush
Dispatcher flush agent is already setup.
(it is a part of the aem installations).
/serveStaleOnError
/allowAuthorized
/rules
/statfileslevel
/invalidate
/invalidateHandler
/allowedClients
/ignoreUrlParams
/headers
e.g.:
An example cache section might look as follows:
/cache
/docroot  “/opt/dispatcher/cache”
/statfile  “/tmp/dispatcher-website.stat”
/allowAuthorized “0”

/rules
{
#List of files that are cached
}
/invalidate
{
# list of filed that are auto-invalidated
}
}

Caching when authentication is  used:
The /allowAuthorized property controls whether request that contain any of the following authentication information  are cached:

The authorization header
A cookie named authorization.
A cookie named login-token

By default requests that include this authentication  information  are not cached because authentication is not performed.
when a cached document is returned to the client.This configuration  prevents Dispatcher from serving cached 
documents to users who do not have the necessary rights.

however, if your requirements permit the caching of authenticated documents, set /allowAuthorized to one:

/allowAuthorized “1”

——————————————————————————————————————————————————
When you feel that the content is sensitive,than we require the authorization.
————————————————————————————————————————————————————
What exactly to cache is the most important thing??
The /rules property controls which documents are cached according to the document path.Regardless of the /rules 
property.Dispatcher never caches  a document  in the following circumstances:
If the HTTP method  is not GET.
Other common methods are post for form data and HEAD  for the HTTP header.
If the request Url contains a question mark (“?”)
This usually indicates a dynamic page such as  a search result that does not need to be cached.
The file extension is missing
The webserver needs the extension to determine the document type (the MIME-type)
The authentication header is set (this can be configured).
(it means that, you dont want to cache it).
for example :
net banking will not be cache always.
If the AEM instance responds with the following headers :
no-cache
no-store
must-revalidate

Each item in the /rules property includes a glob pattern and a type:
The glob pattern is used to match the path of the document.
The type indicates whether to cache the documents  that match the glob pattern.The value can be either allow(to cache the document )or deny (to always render the document).
Note:any url with query parameter cannot be cached.

What we have to do, if we want to cache the query parameter ???
we have to provide the ignore url parameter for that.

the moment you set the dispatcher, if you want to know whether it is working  or not ,you need to do the localhost:8080 for the apache web page to be opened .
Add the dispatcher module to the module folder.
go to the logs > dispatcher log file and check whether the server is running its timestamp  or not.
once that is done add 4 to 5 localhosts to the render.

after that, we see that dispatcher is running on the localhost.(that means we are serving requests from the localhost).

1 comment:

  1. Kaushik Gattu: Dispatcher (Aem) >>>>> Download Now

    >>>>> Download Full

    Kaushik Gattu: Dispatcher (Aem) >>>>> Download LINK

    >>>>> Download Now

    Kaushik Gattu: Dispatcher (Aem) >>>>> Download Full

    >>>>> Download LINK Ar

    ReplyDelete