What is the process manager?
The process manager is an executable program that is responsible for controlling the execution of the FastCGI applications. The control is accomplished through starting the applications, restarting them if they have terminated for some reason and deciding when the FastCGI application should no longer run, as in the case of dynamically started FastCGI apps.
Why is it needed ?
The FastCGI protocol depends on the persistent applications being run during the web server execution to handle incoming HTTP requests. In most cases these persistent applications must be started prior to serving any requests forwarded to them by the web server (an exception are the dynamically started FastCGI applications). Should an application die during processing, it should be immediately restarted, so that it handle the next request. It is the job of the process manager to maintain this control over the execution of the FastCGI applications.
What is going on when the webserver starts ?
When the web server reads its configuration files and encounters a directive in which a certain module maybe interested in, it calls a corresponding function in that module to handle the processing of the directive. In FastCGI module mod_fastcgi, these directives (AppClass and ExternalAppClass) are used to specify what FastCGI applications should be started/restarted by the process manager in order to handle incoming requests. After the web serve has finished reading its configuration files, a process manager process is executed, which in turn executes all preconfigured FastCGI applications, so that they can begin accepting requests. The process manager then enters an infinite loop until the time that the web server has informed it [process manager] about the termination, at which time the process manager’s duties include terminating all child processes (FastCGI applications), freeing up any used resources, such as memory, file descriptors, etc. and terminating.
How does the process manager interact with the dynamically started FastCGI applications ?
Some of the process manager functionality has been changed to accomodate the dynamically created FastCGI processes, i.e. the processes that do not need to be configured using the AppClass directive, but are started on their first invokation just like CGI. Since the process manager has no prior information about the name of the FastCGI application executable that needs to be created, it has to obtain that information from somewhere. However, only the web server knows the name of the FastCGI executable from the HTTP request line, so it puts an information into a file and signals the process manager to create the given FastCGI applications. It is important to note that locking is needed to avoid multiple web server processes writing information into the file at the same time, causing unpredictable behavior.
Once the process manager obtains the name of the FastCGI executable from the file, it creates it and treats it just like any other preconfigured FastCGI application. An important exception is that the dynamically started applications will NOT be restarted, since they may be faulty and we do not want them to run or the process manager has decided that they should be terminated and so it should not attempt to restart them (see below). The process manager only created one copy of the given FastCGI application as it is not aware of how “popular” this application is or will be. Therefore, it needs to perform some data analysis to decide whether another copy of the applications should be started or whether an application has too many instances running and one of them should be terminated or whether all of them should be terminated. The data to be analyzed comed via the same file mechanism from the web server processes.
Since there are many web server processes and only one process manager, the data processing is done within the process manager process. This also means that the process manager is implementing a policy, be deciding which FastCGI applications should have another instance created and which should have their instances terminated. The decisions on termination of the FastCGI apps are made using the following heurisitics (top-level design):
-
given the last time period in which the data analysis was done (which is specified via -updateInterval option to FCGIConfig), for each dynamically started FastCGI application, calculate how much of that time was spent by the application handling the request (the information about it comes from the web server process that knows when the start and end times of the request to the FastCGI application).
-
since the last data analysis period the web server might have been a spike of incoming requests or a dead interval after a number of periods of great activity. As such, you do not want to make your decisions on whether to terminate the FastCGI application just based on the time interval elapsed since the last data analysis was performed. Therefore, an exponential decay was introduced to smooth out the possible spikes in the web server. The formula for calculating the “new popularity” is:
- result = (1-gain)old_value + gainnew_value; old_value = new_value;
-
where the old_value is the load of the FastCGI application calculated during the previous data analysis periods, the new_value is the currently calculated load factor and the gain is the value in the range 0..1 specified via -gainValue directive. As one can see, specifying the gain to be 1 would put more emphasis onto new data (which is useful if you have very large updateInterval or if you would like your FastCGI processes to be responsive, so that if many requests come in, many copies of the FastCGI application are started immediately and if none come in, many copies of the FastCGI application are to be terminated). Specifying the value closer to 0 would put a larger emphasis onto old data, smoothing out the starting and termination of the FastCGI processes.
-
once the new “normalized” load factor is calculated using the above formula, it is compared with the parameters specified via -singleThreshhold and -multiThreshhold options to the FCGIConfig directive. If the given FastCGI application has multiple instances of it running and the “normalized” load factor has fallen below -multiThreshhold, a single instance of that application is marked to be terminated. If the application has only a single copy executing and the “normalize” load factor has fallen below -singleThreshhold, then the only instance is also marked for termination. Since we would like at least a single copy of the application to be run (even if it is not very popular), singleThreshhold parameter should be much much less than multiThreshhold.
-
the process manager also does not want to kill off all FastCGI processes during the long periods of inactivity (say at nighttime), since the cost of starting a process may be high. A -minProcesses parameter is used to tell the process manager NOT to mark anymore processes to be terminated once this minimum is reached just for that case.
-
the actual termination of the dynamically started FastCGI processes takes place later on, which involves some synchornization and locking techniques to avoid receiving and processing a request by the application that is about to be terminated. The actual termination is governed by the -killInterval and -processSlack options to the FCGIConfig directive. In the normal case, the process manager would perform its killing policy (just terminating the FastCGI applications that have been marked as victims during the data analysis stage) every n seconds, where n is the number specified as a parameter to the -killInterval option. However, it may come the time that the web servers are very busy servicing a lot of FastCGI requests. The process manager governs the ceiling on the total number of dynamically started processes that can be running at any one time. So, if web server is asking to start a number of the FastCGI processes, but starting them would exceed the ceiling value, the process manager needs to implement the killing policy immediately, i.e.
- if (#of running processes + processSlack > #max processes) do_killing_policy_now();
The starting of the dynamically started FastCGI processes is also governed by a few options to the FCGIConfig directive. After the web server recieves a request for the FastCGI application, it first determines if the current application has already been started. It is has not, the web server sends a request to the process manager, asking it to start the given FastCGI application. After that the web server attempts to connect to FastCGI applicaton via connect() system call. The attempt may fail, since the process manager needs time to process the data and start the corresponding FastCGI application. To allow for that contingency, a -startDelay option is used to specify a time interval that the web server is going to wait trying to connect to the application. If the connection is still unsuccessful, it will issue another request to start FastCGI application (this is done in case there are FastCGI application currently running but all of them maybe servicing requests or server is implementing a killing policy or something else). The web server keeps repeating its attempts until number of seconds specified via -appConnTimeout parameter has expired, which is usually an indicator of either bad configuration (low values for killInterval and very high load or low value for multiThreshhold, etc) or a bad FastCGI script, where the process manager is unable to start it.
What does the future hold for the process manager ?
The future enhancements to the process manager includes complete separation of process manager functionality from the web server core and providing web-server independent mechanism of informing the process manager of the important events, such as start/termination of the web server, configuration file parsing, etc. This would also aid in porting the process manager functionality to other operating systems. Finally, an overall redesign may be needed to optimize the management of the FastCGI applications, possibly using OS-dependent facilities.
Stanley Gambarin <stanleyg@cs.bu.edu> 1997/09/27 12:23:23 PM PST