The purpose of this document is to provide a very thorough explanation of what happens
during an HTTP request to an ASP.NET page being served up through IIS. It starts
out with the basics but then dives in to the gory details, more detail than is probably
needed for basic web page development. However a thorough knowledge of this process
is invaluable when debugging strange problems, developing ASP.NET custom web controls
or developing any sort of web framework.
It is important to understand that HTTP is the only method that a client (e.g. browser)
uses to communicate with a web server. The client runs on one computer and the server
on another. The client connects a socket to the server and sends an HTTP request.
The server does some processing and returns an HTTP response. That's it. IIS Applications,
ASP.NET, .NET HTTP Modules, AJAX, Web Services, etc. are all just abstractions on
this concept. A deep understanding any web technology requires a good working knowledge
of HTTP.
HTTP Request
The good news is that HTTP is quite simple. At heart HTTP has two concepts - a request
and a response. Let's look at a sample HTTP request that would be sent from a browser
to a web server:
GET /Default.aspx HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, ...
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 ...
Host: localhost
Cookie: ASP.NET_SessionId=xshitibm0r1nlpawjvwfzn55
Connection: Keep-Alive
The most important line is the first one. It says that we're doing a
GET request for the Default.aspx page in the web root. The URL for this
request was "http://localhost/Default.aspx". The only other option besides
GET is POST. There is
some confusion about the difference between GET and
POST. Some say that GET
is used to get information from the server and POST
is used to send information to the server. Although this describes typical usage
it is not strictly true. GET often sends information
to the server through the "query string". The query string is a list of
name-value pairs that are appended to the URL. For example:
GET /Default.aspx?userID=123&password=mydogsname HTTP/1.1
Conversely POST need not send any information to the
server and a POST results in a response being sent
from the web server just like GET does. This
POST is equivalent to the GET example above:
POST /Default.aspx HTTP/1.1
userID=123
password=mydogsname
So the only real difference (as far as HTTP is concerned) is the location in the
request of the name-value parameters. Most web servers set a max size on the query
string though so if large amounts of data need to be sent a
POST is usually used. Also note that POST
can still specify a query string after the page.
The apparent differences (in usage) of GET and POST exist because of the way a browser initiates them.
GET is used when the user types in an URL, clicks a
link, or client-side script navigates the browser. In all these cases the query
string is explicity and manually created. A POST occurs
when an HTML FORM element is "submitted".
In this case the browser automatically creates the HTTP header parameters from the
INPUT, SELECT and TEXTAREA elements on the page.
HTTP Response
The response the web server sends to the client is similar, in format, to the request.
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Thu, 08 Dec 2005 16:39:39 GMT
X-AspNet-Version: 1.1.4322
Set-Cookie: ASP.NET_SessionId=xshitibm0r1nlpawjvwfzn55; path=/
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 3579
<html>
<head>
<title>Home Page</title>
</head>
<body>
...
The first line let's us know that the request was good, the page was found and no
errors occurred. If anything had gone wrong (or a redirection occurred, etc) we
would have seen something else other than "200 OK".
The Content-Type parameter lets us know what kind of data is being returned. In
this case we are told to expect plain old HTML. If we had requested an image file
we would have seen "image/jpeg", for example.
Content-Length tells the client exactly how much data to expect to be returned.
Two CRLFs in a row designate the end of the header and the start of the response
data (in this case HTML).
Web Session
So if HTTP only sends these simple requests and responses how does a web server
maintain a session for the user? This is not done by keeping a socket connected
to the server. Rather a cookie is used to set and send a session ID number. This
is the ASP.NET_SessionId cookie in the
GET and response examples. The cookie is set the first time the client
makes an HTTP request to the web server. The client then passes the cookie back
to the web server on each subsequent HTTP request. If the server doesn't receive
an HTTP request from a client in a set amount of time (session time-out) the memory
reserved for the session is released and the client will receive a new session ID
on the next request. This usually logs the user out of whatever web application
s/he was signed into.
The Web Browser
The web browser (IE, Firefox, etc.) is by far the most common HTTP client, and one
that everyone is familiar with. Every time the user clicks a link, types in an URL,
or hits refresh an HTTP request occurs. In fact many HTTP requests usually occur.
This is because the browser usually requests an page that returns HTML. The HTML
is then parsed to discover links to images, script files, stylesheet files, frame
or iframe sources, etc. Each time that one of these links is discovered an entirely
separate HTTP request is made to retrieve the item. So, browsing to a page can result
in many HTTP requests.
Client-Side Scripting
For an HTTP request to a page that returns HTML it is possible to return code that
will be executed in the browser. This is usually Javascript and is contained between
<script> tags. Since this code runs inside the
browser it is not possible to access any of the code or memory that exists on the
web server (i.e. Session variables). Of course, data that exists on the server side
can be sent to the client browser in the HTML response. The copy of the data can
then be accessed by the client-side javascript. It is very important to remember
that client-side javascript can only have anything happen on the server-side via
HTTP GET or POST.
AJAX
As stated in the Client-Side scripting section an HTTP GET
or POST must occur in order for the browser to get
data from or cause anything to happen on the server. When the browser does either
of these (they're the same remember) the current page is unload and a new page is
created from the HTTP HTML response. This results in "browser flash".
To avoid this flash and provide what seems like a more seemless client experience
the concept of AJAX was developed. AJAX stands for Asynchronous Javascript and XML.
It allows the javascript coder to invoke special javascript methods that send an
HTTP request, read the response and provide the data (typically XML) back to the
method callee.
AJAX is often used to invokes a web service method. This involves sending an HTTP
POST to the web server that invokes the web service
method with the desired parameter values. The data from the response, typically
XML (i.e. the web service return type is string and is formatted as XML), is then
read from the response. The XML is then loaded into the browser's XML parser.
AJAX is great for some things but there are number of things to think about. Don't
do normal navigation through AJAX because the user loses the ability to bookmark,
use the "back" button, etc. You should also deal with the possibility
that the web server doesn't respond or responds with an HTTP error when invoking
a web service method with AJAX.
IIS administration is an art unto itself. All modern web servers have come a long
way from just serving up static web pages. However the interface to the web server
hasn't changed (much). The only thing the web server understands is HTTP. It receives
an HTTP request, does stuff, then returns a response. This section will detail the
"does stuff" part of the process when an ASP.NET page is requested.
The first concept that requires a good understanding is the concept of an IIS application.
The best way to understand an IIS application is to think of it like a process ("EXE")
that is running on the server. This running EXE is then bound to a particular directory
on the server. When a page request comes in to the web server the first thing IIS
does is determine what application it needs to pass the request off to. This is
done by looking at the URL that was specified in the first line of the HTTP request
(see above) and determining the local (e.g. "C:\...") directory the page
will be found in. The request is then forwarded to the application that is bound
to that directory. The process that is the actual web server is really nothing more
than an application broker for HTTP requests. The only thing of significance the
web server does is pass a stream to the application that can be used to write the
response back to the client.
To illustrate this let's imagine IIS is configured with an application on a directory
- "D:\MyStoreWeb". Since this directory is not under "/Inetpub/wwwroot"
it would also be configured as a Virtual Directory. We'll say that the alias for
this directory is configured as "store". Given this the web server will
map the URL "http://serverdnsname/store" to the directory "D:\MyStoreWeb".
Note that you can effectively think of an IIS Application and an IIS Virtual Directory
as the exact same thing except that a Virtual Directory is (usually) located outside
of the "/Inetpub/wwwroot" directory.
The very first time an HTTP request comes in for any file below "http://serverdnsname/store"
(e.g. "http://serverdnsname/store/default.aspx") IIS will start the "store"
application. This is equivalent to launching an "EXE" on the server. This
is why the first hit to an application after the server has been rebooted or "iisreset"
has been run is slow. This is also when the System.Web.HttpApplication's
Application_Start method is invoked. Usually this is
done in the Global.asax file that VS generates. It is important to note that static
class members properties are global to the entire application just like they would
be in a full-blown windows process. This means that static class members are available
to all web request sessions.
You can tell if a folder is an application by opening Internet Services Manager,
finding the folder (or Virtual Directory), right-clicking on it and viewing its
Properties. On the Directory tab there will be a box labelled "Application
name:". If it is greyed out there is no application. If it is enabled and a
name (alias) has been specified there is an application. To remove an application
click the "Remove" button beside it.
In .NET there is a complicating factor. A "Web.config" file in a web directory
also starts a new application. Not a full-blown IIS application that you can see
in Internet Serives Manager, but enough of one such that new Application and Session
objects are created. This causes a problem commonly encountered when you use Visual
Studio to create a new "Web Application" project in a subdirectory under
an existing IIS application. VS automatically creates both an IIS application as
well as new "Web.config" and "Global.asax" files. All of these
must be removed or application and session information from the "parent"
application will be unavailable to the "child" application. In VS 2005
this problem goes away because a special web server is used for coding/debugging
and IIS is left alone.
Hopefully this all makes sense because now it gets more complicated. What I've stated
so far is a simplification of what an IIS application is for the case that you only
have pages that are from a single technology base (like .NET). One of the things
that can be configured (individually) for an IIS application is what happens for
each type of page that is requested. This is done by configuring a file extension
to be handled by what is called an ISAPI extension. This is done by clicking the
"Configuration" button on the application's "[Virtual] Directory"
tab on its "Properties" window. The "App Mappings" tab shows
each file extension and what ISAPI dll will handle it. IIS streams the contents
of files not listed here (typically .html, .gif, .jpg, etc.) directly back to the
client. A file request for any of the files listed here is passed to the specified
ISAPI DLL. This DLL then becomes responsible for reading the file (if there even
is one, the file doesn't really have to exist at all!), interpretting the contents
and writing the response. The complication here is that each DLL maintains it's
own application and session objects. So really, a single IIS application can have
many application objects and multiple sessions for a client that accesses pages
of differing technologies. Each ISAPI dll can independently override the IIS application's
session time-out value.
Like a normal windows process an IIS application runs under a windows user account.
By default this is a low privilege account like the built-in <computername>_IUSR
account. In IIS this is configured for the application by clicking the top "Edit"
button in "Directory Security" tab in the appplicatio's "Properties"
window. "Windows Authentication" can be used to have web requests run
under a window account that is specified by the client browser. Otherwise the request
is considered anonymous and runs under whatever user is configured as the "Account
used for anonymous access". A complication here is .NET. Unless the web request
is set to run under "Window Authentication" .NET will use a special, low-privilege
account called ASPNET. It will ignore whatever you set as the "Account used
for anonymous access" in IIS. To get it to use the account specified here,
you need to add the following to your application's "Web.config" file:
<identity
impersonate="true"
/>
Enough about IIS applications already! Let's start looking at what happens when
a request for a .NET page (.aspx, .asmx) comes into the web server.
Loading the Page
So far we have:
- HTTP request for "/store/Default.aspx" hits the web server (IIS).
- IIS realizes the "store" application handles this request.
- IIS looks at the "App Mappings" configured for the "store" application and sees that ".aspx" is mapped to "aspnet_isapi.dll".
- IIS passes the request to "aspnet_isapi.dll" and gives it a stream to write the response back to the client.
What happens next is that "aspnet_isapi.dll" contacts (or starts if this
is the first .NET request) a process called "aspnet_wp.exe". The request
is then forwarded to this process. Now the hot-potato shuffling of the web request
is finally done. "aspnet_wp.exe", referred to as "ASP.NET" from
now on, is actually going to do something!
The first thing that ASP.NET does is look at the location of the requested file
and find the "nearest" Web.config file for it. "Nearest" being
the the first Web.config it finds as it looks in the directories from the file's
directory up to the IIS application's web root. If the Web.config file has been
loaded and a .NET application has been created for it, ASP.NET uses this one, otherwise
a new one is created. In .NET this application object is of type
System.Web.HttpApplication and is generally accessed via a
Page's Application property.
Next it finds or creates the session object for the web request. This object is
of type HttpSessionState and is usually accessed by
the Page's Session property.
The next step is important to understand when debugging. ASP.NET looks in the application
folder's "bin" directory and loads all the DLLs into memory. As it loads
each DLL it finds all the namespaces in them and (I'm assuming) stores them in an
easily referenced list. Note that this loading of DLLs only occurs when the application
is starting up (on the first page request).
The next thing ASP.NET checks (OK, I'm sure it's doing a LOT more, but none of it
is relevant here) is if any "HTTP Modules" have been added to the application.
An HTTP module is a way of intercepting the request at this stage, before ASP.NET
goes nuts and starts actually parsing the file, running the code behind, etc. An
HTTP module can take over the response stream and preempt "normal" ASP.NET
processing. HTTP modules are added to the application in the Web.config file, for
example:
<httpmodules>
<add name="WebChartImageStream"
type="Company.Product.Web.Module.HttpModules.ClassName,
Company.Product.Web.Module.HttpModules" />
</httpmodules>
The "type" attribute of the add element should
be read as:
type="full class name, class namespace"
Anyway, let's ignore HTTP modules or at least pretend that one is not intercepting
our web request. The next thing that happens is that ASP.NET actually reads the
contents of the requested file. It's going to categorize the file into one of two
types:
- A plain, boring, static content page.
- An ASP.NET page with web controls and server-side processing code.
It does this by looking for a line with an @Page directive.
If no such line is found ASP.NET assumes the page has nothing but boring static
content and immediately renders the page contents to the response stream. Otherwise
it starts processing the server side directives. An (ASP.NET 1.1)
@Page directive might look like:
<%@ Page language="c#" Codebehind="Default.aspx.cs" AutoEventWireup="false"
Inherits="Company.Product.Web.Module.DefaultPage" %>
The "language" and "Codebehind" attributes are self-explanatory.
"AutoEventWireup" can be ignored for now. It's the "Inherits"
attribute that is really important when trying to solve problems. This value specifies
the full namespace-qualified class name that handles this page/file. Everything
before the last "." is the namespace of the class. Remember the list of
namespaces that ASP.NET loaded from the DLLs in the "bin" directory. ASP.NET
will now use this list to find the DLL that contains the class specified in the
"Inherits" attribute. The class can then be instantiated and used to handle
the page request. If ASP.NET could not find the namespace, or the class in the namespace,
it will write the following, commonly seen, error to the response stream:
Parser Error Message: Could not load type 'Company.Product.Web.NotDefault'.
So, "could not load type" means "could not find the DLL, the namespace,
or the class". When you get this error first check that the DLL for the page
is in the web app's "bin" directory. Next check that the class name specified
in the "Inherits" attribute is correct. If that seems good check the namespace
carefully. Use ILDASM to make sure the DLL in the "bin" really does contain
the class. If all this seems correct then it is probably a dependency issue. A "could
not load type" type error may also occur when the .NET framework cannot find
a DLL/namespace/class that the DLL is complaining about references. In this
case use ILDASM to identify the dependencies and perform the same file/namespace/class
analysis for each dependency. If all else fails make sure the "bin" directory
you are looking at is indeed correct for the application - perhaps another IIS application
or Web.config file is causing another "bin" directory to be used.
Before going on to parsing the page there is one more thing the framework does that
should be mentioned. This is the creation of the HttpRequest
and HttpResponse objects that are commonly accessed
via the Page's Request
and Response properties respectively. The creation
of the HttpRequest includes the populating of the Form (if POST) and
QueryString (if "?..." in URL) collections.
Parsing the Page & Creating the Controls
ASP.NET is now ready to start parsing the page's file. A key part to understanding
this process is to understand that the "Inherits" attribute of the @Page directive specifies a class that must inherit from
System.Web.UI.Page. This class contains a property
called Controls that is of type
System.Web.UI.ControlCollection. This collection contains a list of instances
of classes that inherit from System.Web.UI.Control.
In other words, an ASP.NET Page contains a list of
sub-controls.
This Control class has a method called Render that writes
HTML back to the response stream. So when it comes time (we'll get there!) to send
the page's HTML back to the client ASP.NET iterates through the Control
instances and calls the Render method on each one. Essentially the
page doesn't contain any HTML of it's own - all the HTML is contained in it's sub-controls.
Although it is possible to add controls to a Page's
Controls collection directly this is seldomly done.
ASP.NET automatically populates this collection for you. To illustrate what it does,
let's consider the following example:
<%@ Page language="c#" Codebehind="Default.aspx.cs" AutoEventWireup="false"
Inherits="Company.Product.Web.Module.DefaultPage" %>
<html>
<head>
<title>Home Page</title>
</head>
<body>
<p>Home Page</p>
<p>User Name:
<asp:textbox
id="loginIdTextBox"
runat="server"
width="50px"></asp:textbox></p>
</body>
</html>
Since plain old HTML can be rendered directly to the output stream ASP.NET will
parse HTML text blocks into a string that is assigned to an instance of
System.Web.UI.LiteralControl. This LiteralControl
instance is then added to the Page's
Controls collection. When the parser comes across an element that has an
attribute called "runat" that is set to "server" (referred to
as a server-side control) it does not use a LiteralControl.
Instead it looks at the element name to determine what control type to instantiate.
In the case of our example the parser will find the asp:textbox
element. Element names prepended with "asp:" are known by the ASP.NET
framework as controls to be found in the System.Web.UI.WebControls
namespace. So ASP.NET will instantiate an instance of the System.Web.UI.WebControls.TextBox
class (notice that ASP.NET is case-insenstive here). The attributes of a server-side
control element are assigned to properties of the class instance.
When ASP.NET is finished parsing our example page the Page's
Controls will contain the following items:
- LiteralControl instance, .Text = "<html> ... User Name:"
- System.Web.UI.WebControls.TextBox instance, .ID = "loginIdTextBox", .Width = "50px"
- LiteralControl instance, .Text = "</p> ... </html>"
Remember that each Control in the collection has a method called Render. This method renders the control to the response
stream (i.e. renders HTML back to the browser). A LiteralControl
simply renders its Text. A TextBox
control obviously does a bit more since it will need to examine its
ID, Width, etc. properties to determine what
it should write to the response stream - e.g. "<input
type=text id=loginIdTextBox style=width:50px; ... >".
This Controls collection is now programmatically available
in in any of the Page's event handlers (e.g.
Init, Load, etc. more about these later).
It would be possible to iterate through the collection, find the
TextBox with and ID of "loginIdTextBox",
cast it to local TextBox variable and manipulate any
of its properties. This is seldomly done though because ASP.NET provides a mechanism
that allows the controls to be referenced in a much nicer fashion.
This is done by creating a protected class variable
of the type implied by the HTML element ("asp:textbox" =
System.Web.UI.WebControls.TextBox). The name of the class variable must
match the ID of the HTML element. After the Controls
collection has been populated ASP.NET iterates through it and sets the
protected appropriate class variable for each Control.
LiteralControl instances are ignored since they don't
have an ID. So for our example our "codebehind" class will have the following
line:
protected System.Web.UI.WebControls.TextBox loginIdTextBox;
If, for example, we want to set the text of the loginID input box in the
Page's Load event handler we could do the
following:
private void Page_Load(object sender, System.EventArgs e)
{
this.loginIdTextBox.Text = "test";
}
Note that ASP.NET does not require a server-side control with an ID to have a corresponding
class variable.
Page Lifecycle
Now that we're done the parsing stage we're going to enter the stages that the developer
gets to intercept and insert custom code into. First let's recap the stages so far:
- Load/create .NET Application, Session, Response and Request objects for the
request.
- Find page file.
- Find page DLL file.
- Instantiate Page class instance.
- Parse page, populate the Page instance's
Controls collection, set the class control variables.
The next stages are:
- PreInit
- Init
- Load ViewState
- Load
- PreRender
- PreRenderComplete
- Render
- Unload
PreInit
Introduced in ASP.NET 2.0 this method can be used to set themes programmatically.
Note that if the page for this method has a master page associated with it
the controls for the page will not be available (i.e. will be null)
in this stage.
Init
The Init stage begins with the execution of the Page's
OnInit method. This method is virtual so it can be
overridden in our Page class derivative.
Load ViewState
To understand what ViewState is and what it is good for we need to remember that
while the user is looking at an ASPX page nothing about the page is "remembered"
on the web server. In other words, after ASP.NET is done loading a page, setting
up all the control objects, and finally writing the response stream, all the information
about the page/request is released. If the user causes the page to be
POSTed back to the server the entire page object and all its controls must
be entirely rebuilt.
Suppose for example a page contains a server-side data list control with a SortBy property. When the user clicks a column header a
POST occurs to set the SortBy
property and thereby render the data list sorted by that column. Now let's say that
the user has clicked on the "Name" column. A POST
occurs, the SortBy property is set to "Name"
and the response that is written back contains the data, sorted by name. All is
good. Then the user makes some changes to the filtering criteria for the list, which
causes the page to be POSTed back to the server. The
Page object is rebuilt from scratch, including the
controls. This means that the SortBy property will
be set to whatever its default value is. The "Name" sorting has been lost.
In pre-ViewState days (ASP) we would have solved this problem by storing the sort
clause in a POST-able HTML element like
INPUT (type="hidden"). We would then have set the value of this
element before POSTing, read the value out of the Request.Form object and manually set the (equivalent of
the) SortBy property. Saving the state of all these
properties was time-consuming and laborious.
What ViewState is then is a mechanism to have control properties automatically be
saved and restored across HTTP POSTs. So the control
writer, instead of reading/writing a property from/to a class variable, like so:
public string SortBy
{
get
{
return this.sortBy;
}
set
{
this.sortBy = value;
}
}
would read/write the value from/to ViewState as follows:
public string SortBy
{
get
{
return this.ViewState["sortby"].ToString();
}
set
{
this.ViewState["sortby"] = value;
}
}
When using the ViewState property of the page values
are stored across HTTP POSTs. An important point to
stress is that ASP.NET loads information from ViewState and sets the appropriate
controls' properties between the Init and
Load stages. So if you set a control property in the
Init stage it may get overwritten by a ViewState value and therefore
be different in the Load stage.
If you're curious to know how ASP.NET does the magic behind ViewState read on. Otherwise
skip to the next section.
In order to save the ViewState information that has been set ASP.NET writes it to
the response stream. It does this by adding an HTML INPUT
element (type="hidden") to the page. This element is given the name "__VIEWSTATE".
The Page's ViewState object
is then serialized as a string of "control.property= value" pairs (e.g."datalist1.sortby=name&datalist1.userid=123").
This string is then base 64-encrypted and given as the value for the "__VIEWSTATE"
element. When the page is POSTed back this "__VIEWSTATE"
is POSTed with it. ASP.NET can then retreive its value,
decrypt it and set the appropriate control properties.
Load
The Init stage begins with the execution of the Page's
OnLoad method. This method is virtual so it can be
overridden in our Page class derivative. However it
is more common, and better practice, to set the Page's
Load event handler in the OnInit
method. When creating a new page Visual Studio does this for you and names the event
handler method "Page_Load". This method is the recommended place for doing
normal page/control initialization.
PreRender
Overriding the Page.OnPreRender method or handling
the PreRender event allows for code to be executed
just before the page/control is about to be rendered.
PreRenderComplete
This method was introduced in ASP.NET 2.0. It's purpose is to provide an event handler
that can be used to do work immediately after asynchronous events have finished. See
Asynchronous Calls in a Web Page for more information.
Render
This is the stage where the page and all its sub-controls are rendered to the response
stream. For a page this method is implemented by the ASP.NET framework and does
all the work for you. For a custom control you will need to override and implement
the Render method to have your control render the appropriate
HTML.
Unload
Now everything about the HTTP request is released starting with the ASP.NET page
and controls and going all the up the stack to the ISAPI extension and possibly
even the socket. Nothing about the request remains except a line in the web server's
log file.
At this point your brain is recoiling from knowing WAY too much about an ASP.NET
page request. If you actually managed to read the whole document through at once
you will probably be suffering from information overload. That's OK, just refer
to the document as needed. However, if you're going to be doing custom web control
or web framework development I encourage you to know as much about this stuff as
possible. Reread as necessary until you understand it all.