Regression is used for forecasting by establishing a mathematical relationship between two or more variables. We are interested in identifying relationships between variables and demand. If we know that something has caused demand to behave in a certain way in the past, we would like to identify that relationship so if the same thing happens again in the future, we can predict what demand will be. For example, there is a relationship between increased demand in new housing and lower interest rates. Correspondingly, a whole myriad of building products and services display increased demand if new housing starts increase. The rapid increase in sales of VCRs has resulted in an increase in demand for video movies.

The simplest form of regression is linear regression, which we used previously to develop a linear trend line for forecasting. Now we will show how to develop a regression model for variables related to demand other than time.

**Linear regression** is a mathematical technique that relates one variable, called an *independent variable,* to another, the *dependent variable,* in the form of an equation for a straight line. A linear equation has the following general form:

Because we want to use linear regression as a forecasting model for demand, the dependent variable, *y,* represents demand, and *x* is an independent variable that causes demand to behave in a linear manner.

To develop the linear equation, the slope, *b,* and the intercept, *a,* must first be computed using the following least squares formulas:

| ||

The State University athletic department wants to develop its budget for the coming year using a forecast for football attendance. Football attendance accounts for the largest portion of its revenues, and the athletic director believes attendance is directly related to the number of wins by the team. The business manager has accumulated total annual attendance figures for the past eight years: Given the number of returning starters and the strength of the schedule, the athletic director believes the team will win at least seven games next year. Develop a simple regression equation for this data to forecast attendance for this level of success.
The computations necessary to compute
Substituting these values for
Thus, for
The data points with the regression line are shown in the figure. Observing the regression line relative to the data points, it would appear that the data follow a distinct upward linear trend, which would indicate that the forecast should be relatively accurate. In fact, the MAD value for this forecasting model is 1.41, which suggests an accurate forecast. |

**Correlation** in a linear regression equation is a measure of the strength of the relationship between the independent and dependent variables. The formula for the correlation coefficient is

The value of *r* varies between -1.00 and +1.00, with a value of +1.00 indicating a strong linear relationship between the variables. If *r* = 1.00, then an increase in the independent variable will result in a corresponding linear increase in the dependent variable. If *r* = -1.00, an increase in the dependent variable will result in a linear decrease in the dependent variable. A value of *r* near zero implies that there is little or no linear relationship between variables.

We can determine the correlation coefficient for the linear regression equation determined in Example 10.9 by substituting most of the terms calculated for the least squares formula (except for S*y*^{2}) into the formula for *r*:

This value for the correlation coefficient is very close to 1.00, indicating a strong linear relationship between the number of wins and home attendance.

Another measure of the strength of the relationship between the variables in a linear regression equation is the **coefficient of determination.** It is computed by squaring the value of *r.* It indicates the percentage of the variation in the dependent variable that is a result of the behavior of the independent variable. For our example, *r* = 0.947; thus, the coefficient of determination is

This value for the coefficient of determination means that 89.7 percent of the amount of variation in attendance can be attributed to the number of wins by the team (with the remaining 10.3 percent due to other unexplained factors, such as weather, a good or poor start, or publicity). A value of 1.00 (or 100 percent) would indicate that attendance depends totally on wins. However, since 10.3 percent of the variation is a result of other factors, some amount of forecast error can be expected.

The development of the simple linear regression equation and the correlation coefficient for our example was not too difficult because the amount of data was relatively small. However, manual computation of the components of simple linear regression equations can become very time-consuming and cumbersome as the amount of data increases. Excel (as well as Excel OM and POM for Windows) has the capability of performing linear regression.

Exhibit 10.8 shows a spreadsheet set up to develop the linear regression forecast for Example 10.9 for the State University Athletic Department. Notice that Excel computes the slope directly with the formula "=SLOPE(B5:B12,A5:A12)" entered in cell D7 and shown on the formula bar at the top of the spreadsheet. The formula for the intercept in cell D6 is "=INTERCEPT(B5:B12,A5:A12)." The values for the slope and intercept are subsequently entered into cells D9 and F9 to form the linear regression equation. The correlation coefficient in cell D13 is computed using the formula "=CORREL(B5:B12,A5:A12)". Although it is not shown on the spreadsheet, the coefficient of determination (*r*^{2}) could be computed using the formula "=RSQ(B5:B12,A5:A12)".

A linear regression forecast can also be developed directly with Excel using the "Data Analysis" option from the Tools menu we accessed previously to develop an exponentially smoothed forecast. Exhibit 10.9 shows the selection of "Regression" from the Data Analysis menu, and Exhibit 10.10 shows the Regression window. We first enter the cells from Exhibit 10.8 that include the *y* values (for attendance), B5:B12. Next enter the *x* value cells, A5:A12. The output range is the location on the spreadsheet that you want to put the output results. This range needs to be large (18 cells by 9 cells) and not overlap with anything else on the spreadsheet. Clicking on "OK" will result in the spreadsheet shown in Exhibit 10.11. (Note that the "Summary Output" has been slightly moved around so that all the results could be included on the screen in Exhibit 10.11).

The "Summary Output" in Exhibit 10.11 provides a large amount of statistical information, the explanation and use of which is beyond the scope of this text. The essential items that we are interested in are the intercept and slope (labeled "X Variable 1") in the "Coefficients" column at the bottom of the spreadsheet, and the "Multiple R" (or correlation coefficient) value shown under "Regression Statistics."

Note that Excel OM also has a spreadsheet macro for regression analysis that can be accessed similarly to the exponentially smoothed forecast in Exhibit 10.5.

Another causal method of forecasting is **multiple regression,** a more powerful extension of linear regression. Linear regression relates demand to one other independent variable, whereas multiple regression reflects the relationship between a dependent variable and two or more independent variables. A multiple regression model has the following general form:

For example, the demand for new housing *(y)* in a region might be a function of several independent variables, including interest rates, population, housing prices, and personal income. Development and computation of the multiple regression equation, including the compilation of data, is more complex than linear regression. The only means for forecasting using multiple regression is with a computer.

To demonstrate the capability to solve multiple regression problems with Excel spreadsheets we will expand our State University athletic department example for forecasting attendance at football games that we used to demonstrate linear regression. Instead of attempting to predict attendance based on only one variable, wins, we will include a second variable for advertising and promotional expenditures as follows:

We will use the "Data Analysis" option (add-in) from the Tools menu at the top of the spreadsheet that we used in the previous section to develop our linear regression equation, and then the "Regression" option from the "Data Analysis" menu. The resulting spreadsheet with the multiple regression statistics is shown in Exhibit 10.12

Note that the data must be set up on the spreadsheet so that the two *x* variables are in adjacent columns (in this case A and B). Then we enter the "Input X Range" as A4:B12 as shown in Exhibit 10.13.

The regression coefficients for our *x* variables, wins and promotion, are shown in cells B27 and B28. Thus the multiple regression equation is formulated as

This equation can now be used to forecast attendance based on both projected football wins and promotional expenditure. For example, if the athletic department expects the team to win seven games and plans to spend $60,000 on promotion and advertising, the forecasted attendance is

If the promotional expenditure is held constant, every win will increase attendance by 3,560.99, whereas if the wins are held constant, every $1,000 of advertising spent will increase attendance by 36.8 fans. This would seem to suggest that the number of wins has a more significant impact on attendance than promotional expenditures.

*r*^{2}, the coefficient of determination, shown in cell B19 is .900, which suggests that 90% of the amount of variation in attendance can be attributed to the number of wins and the promotional expenditures. However, as we have already noted, the number of wins would appear to probably account for a larger part of the variation in attendance.

THE COMPETITIVE EDGE |

Competing with Accurate Daily Demand Forecasts at Vermont Gas Systems |

Vermont Gas Systems is a natural gas utility serving approximately 26,000 business, industrial, and residential customers in 13 towns and cities in northwestern Vermont. Demand forecasts are a critical part of Vermont Gas Systems' supply chain that stretches across Canada. Gas is transported from suppliers in western Canada to storage facilities along the Trans-Canada pipeline to Vermont Gas Systems' pipeline. Gas orders must be specified to suppliers at least 24 hours in advance. Enough gas must be ordered to meet customer needs, especially in the winter, but too much will needlessly and expensively tax Trans-Canada Pipelines' facilities. Vermont Gas Systems has storage capacity available for a buffer inventory of only one hour of gas use so an accurate daily forecast of gas demand is essential. Vermont Gas Systems uses regression to forecast daily gas demand. In its forecast models, gas demand is the dependent variable, and factors such as weather information, industrial customer demand, and changing end-use consumer demand are independent variables. During the winter customers use more gas for heat, making an accurate weather forecast a very important factor. Detailed three-day weather forecasts are provided to Vermont Gas Systems five times per day from a weather forecasting service. Individual regression forecasts are developed for 24 large-use industrial and municipal customers such as factories, hospitals, and schools. End-use demand is the total potential capacity of all natural gas appliances in the system. It changes daily as new customers move into a new house, apartment, or business adding new appliances or equipment to the system. Another factor related to end-use demand is water temperature, which will decrease by as much as 25 degrees Fahrenheit within the city water system during the winter. End-use demand and water temperature changes have minimal affect on a daily basis, but their impact is significant over several weeks. To compensate for these factors, the utility uses only the most recent 30 days of demand data in developing its forecast models and updates the models on a weekly basis. The results of the forecast model are interpreted by Vermont Gas Systems and supplemented with its individual knowledge of the supply chain distribution system and customer usage to develop an overall, accurate daily forecast of gas demand. |

Source: M. Flock "Forecasting Winter Daily Gas Demand at Vermont Gas Systems," The Journal of Business Forecasting 13, no. 1 (spring 1994): 2. |

A problem often encountered in multiple regression is *multicollinearity,* or the amount of "overlapping" information about the dependent variable that is provided by several independent variables. This problem usually occurs when the independent variables are highly correlated, as in this example, in which wins and promotional expenditures are both positively correlated (i.e., more wins coincide with higher promotional expenditures and vice versa). (Possibly the athletic department increased promotional expenditures when it thought it would have a better team that would achieve more wins.) Multicollinearity and how to cope with it is a topic that is beyond the scope of this text and this brief section on multiple regression; however most statistics texts will discuss this topic in detail.

**10-24.** Define the different components (*y, x, a,* and *b*) of a linear regression equation.

***10-25.** The Chamber of Commerce publishes guidelines for sales forecasting in small businesses. Summarize these guidelines in a one-page report. Which of the recommendations are unique to small businesses?

***10-26.** Visit the web site of the Forecasting Business Connection. What services does it offer to assist companies in their forecasting efforts?

***10-27.** Link to APICS's survey of forecasting software. What types of forecasting techiniques do most of the companies offer? Link to several company sites (you have to find the URLs) and print out examples of their forecasting "product".

**These exercises require a direct link to a specific Web site. Click Internet Exercises for the list of internet links for these exercises.*