We are searching data for your request:

**Forums and discussions:**

**Manuals and reference books:**

**Data from registers:**

**Wait the end of the search in all databases.**

Upon completion, a link will appear to access the found materials.

Upon completion, a link will appear to access the found materials.

In a hypothetical PET study, my sample consists of a clinical population which exhibit brain hypermetabolism. I divide my sample into two homogenous groups. One receives a treatment whereas the other does not (control). My analysis is a 2 x 2 mixed factorial design with treatment (receive treatment, doesn't receive treatment) as the between groups factor and treatment status (pretest, postest) as the within-subjects factor.

The treatment group complete a cognitive task which is expected to reduce the brain hypermetabolism to almost nothing but return to elevated hypermetabolism in up to 2 minutes. In other words, the cognitive task reduces the activity for a short amount of time but slowly the brain hypermetabolism will return to the elevated levels that distinguish our clinical sample. It is not known whether this regression to pretreatment levels will be linear or non linear.

In my design I compare group averages in brain hypermetabolism in the treatment vs. control group pretest and posttest. There should be no difference in our control group in the pretest and posttest stages, but there should be a significant difference in the treatment group.

My question is, is there a standard procedure for obtaining group averages on a linear function? If we know that the brain hypermetabolism returns in <2 minutes, should we average the value of the line across 2 minutes? 1 minute? At the participant level? At the group level?

Alternatively, instead of working out an average, should one calculate the slope of the lines in treatment vs. non-treatment conditions and compare these?

I would much appreciate citations to handle such procedures.

## Averaging a linear or nonlinear function treatment - Psychology

At Uber, we test most new features and products with the help of experiments in order to understand and quantify their impact on our marketplace. The analysis of experimental results traditionally focuses on calculating average treatment effects (ATEs).

Since averages reduce an entire distribution to a single number, however, any heterogeneity in treatment effects will go unnoticed. Instead, we have found that calculating quantile treatment effects (QTEs) allows us to effectively and efficiently characterize the full distribution of treatment effects and thus capture the inherent heterogeneity in treatment effects when thousands of riders and drivers interact within Uber’s marketplace.

Besides providing a more nuanced picture of the effect of a new algorithm, this analysis is relevant to our business because people remember negative experiences more strongly than positive ones (see Baumeister et al. (2001) ). In this article, we describe what QTEs are, how exactly they provide additional insights beyond ATEs, why they are relevant for a business such as Uber’s, and how we calculate them.

### Differentiating between QTEs and ATEs

To better understand how QTEs differ from ATEs, let us focus on a specific example. Assume that we want to analyze the impact of an improved algorithm for matching a rider with the most appropriate driver given a specific destination.

For this hypothetical example, assume that the outcome metric of interest is the time it takes the driver to pick up the rider, also called the estimated time to arrival (ETA). Using the potential outcomes framework developed by Professor Donald B. Rubin (see Imbens and Rubin (2015) ), we denote the assignment of rider to the treatment algorithm with and otherwise. We denote the potential outcome for each individual as . That is, is the ETA for rider under the incumbent or control algorithm, and is the ETA under the new or treatment algorithm. Of course, we only observe one outcome for rider because we cannot assign them to both the new and the old algorithm. We denote the observed outcome as with

with . In other words, is the cumulative distribution function (CDF) of ETAs under the new algorithm, and is the CDF of ETAs under the incumbent algorithm.

By far the most widely used way of characterizing the difference in outcomes is by focusing on the (population) ATE, i.e., .

Even though we do not observe the same rider under both algorithms, assuming the experiment design satisfies a set of regularity assumptions, we can estimate the ATE by comparing the average ETA of those exposed to the new algorithm to the average ETA of those exposed to the incumbent algorithm.

Averages are effective at summarizing a lot of information into a single number. We might learn, for example, that the average ETA of the new algorithm is no different from the average ETA of the old algorithm (an ATE of zero). But does this really mean there is no meaningful difference between the two algorithms? Given the large amount of aggregated and anonymized data leveraged by teams at Uber, can we do better than just analyzing the ATE?

#### ATEs do not allow us to understand heterogeneity in treatment effects

Precisely because averages reduce all information into a single number, they can mask some of the subtleties of the underlying distributions. For example, imagine that Figure 1, below, depicts the ETAs across riders for the treatment group (blue solid line) and the control group (red dashed line). Both distributions have the same mean and, thus, the ATE would be zero. However, the figure also reveals that the right-hand tail of ETAs under the new algorithm is much fatter than under the old algorithm. That is, there are a number of riders that experience ETAs far longer than the longest ETAs under the old algorithm. These experiences of longer ETAs under the new algorithm are balanced by a lot of experiences of lower ETAs, as seen by the increased mass towards the left tail of the treatment distribution.

Figure 1: The result of a hypothetical experiment shows that the distribution of ETAs generated by the new algorithm is wider than the one generated under the old algorithm. Both short and long ETAs are more common under the new algorithm.

Note that this heterogeneity in treatment effects across riders need not necessarily be due to observable components like location of the request, time of day, or the weather. If this was the case, we could imagine a slightly more complex experiment analysis that would try to control for these factors and may lead to sufficiently informative ATEs conditional on these observable factors. But in fact, the sheer number of drivers and riders interacting with each other in Uber’s marketplace suggests that there will be heterogeneity in treatment effects that is unexplainable by any observable factors. It is in this scenario where QTEs really provide additional insights not found by simply looking at the ATE, even after conditioning on any imaginable observable factor.

#### Ignore this heterogeneity at your own peril

But even if there are differences in treatment effects across riders, do they matter for the business? Is it business-relevant that some riders experience longer ETAs under the new algorithm, or is all that matters that riders experience no difference in ETAs on average?

Since most riders interact with the Uber platform on multiple occasions, they experience different ETAs over time. And research suggests that negative experiences loom larger in people’s memories than positive experiences. That is, even though a given rider experiences on average the same ETAs generated by the new algorithm, the fact that there will be a number of ETAs that are longer than under the incumbent algorithm may lead to that particular rider *thinking* that ETAs have gotten worse. This implies that accounting for the difference in outcome distributions beyond comparing the average ETAs is important for the business, which is where QTEs enter the picture.

#### Quantile treatment effects allow us to capture this heterogeneity

In order to capture the idea that long ETAs have gotten longer, we define the QTE as the difference in a specific quantile of the outcome distribution under treatment and that same quantile of the outcome distribution under control. That is,

Using the same distributions for ETAs as in Figure 1, Figure 2, below, depicts graphically the QTE for the 95th percentile, i.e., . Note that the QTE defined in this way cannot tell us what the difference in ETA for a *specific* rider is. In other words, the QTE as defined here does not allow us to learn how long the ETA generated by the new algorithm is for a specific rider whose ETA was at the 95th percentile under the incumbent algorithm. It only allows us to compare the 95th percentile of ETAs in the distribution across all riders for the treatment group to the 95th percentile in the distribution across all riders of the control group. But because we do not observe the same rider under both algorithms, we cannot say anything about the correlation between and for a given rider (without making any further assumptions). Thus, all we can hope for to learn from an experiment is information about the marginal distributions of the outcomes of interest.

Figure 2: The 95th percentile ETA under the new algorithm is larger than the 95th percentile ETA under the incumbent algorithm, leading to a positive QTE.

Given the large amounts of data that we can analyze after an experiment, we can, of course, calculate the QTE for many different quantiles, for example from the 1st through the 99th. If we plot all of them in a single figure, the resulting figure might look like Figure 3, below:

Figure 3: Plotting the QTEs on the vertical axis against the quantiles shows that they are negative up until about the 60th percentile and positive above the 60th percentile. This is another way of seeing that both short and long ETAs are more frequent under the new algorithm compared to the incumbent one.

The figure shows that, as seen from the inspection of the two different outcome distributions in Figure 1, the QTE was negative for low quantiles and positive for high quantiles. In other words, short *and* long ETAs are *both* more frequent under the new algorithm.

Figures like this have allowed us to gain much more nuanced insights into the impacts of our experiments at Uber. For example, analyses of QTEs have allowed us to detect deteriorations to our marketplace from specific algorithms. These deteriorations occurred at extreme outcomes for a metric and were easily detected in the QTE at the 95th percentile. At the same time, the ATE was small enough so as not to raise any concerns.

#### Calculating QTEs through quantile regression

Similar to using linear regression to calculate ATEs, we can use quantile regression to calculate QTEs (see Koenker (2005) ). One advantage of doing so is the ability to rely on existing literature, cited below, that develops robust inference methods for the estimates, comparable to robust inference for linear regression.

Whereas linear regression models the conditional mean function of the outcome of interest, quantile regression models the conditional quantile function. In order to estimate the QTE, we specify the conditional quantile function

Then and (see Koenker (2005) ). Thus, a quantile regression of the outcome of interest on a constant and a treatment indicator allows us to estimate the QTE at the -th quantile, just like a linear regression of the same type estimates the ATE.

Similar to linear regression coefficients, quantile regression coefficients can be determined as the solution to a specific optimization problem. For a given quantile the coefficients and are the solution to

and is the indicator function (see Koenker (2005) ). In contrast to the case of linear regression, the objective function for quantile regression is not differentiable, and there are several different ways of calculating the minimum. One possibility is to write the minimization problem as a linear program and use an appropriate solver. At Uber, however, we solve the optimization through an algorithm suggested by David R. Hunter and Kenneth Lange in an article for the Journal of Computational and Graphical Statistics . By developing an efficient implementation of this algorithm using optimized linear algebra routines, we have found that this algorithm scales quite well to the often millions of observations we need to analyze for a single experiment.

By characterizing quantile regression coefficients as the solution to a minimization problem, we can derive their limiting distributions using the theory for (non-differentiable) M-estimators. With the limiting distribution, we can then derive confidence intervals for the QTEs. Similar to the case for linear regression, a number of robust inference results are available in the literature. Thus, for example, there are results for inference robust to heteroskedasticity ( Kim and White (2003) ), autocorrelation ( Gregory et al. (2018) ), and cluster-robust standard errors ( Parente and Santos Silva (2015) ).

### Moving forward

Quantile treatment effects (QTEs) enable data scientists at Uber to better identify when degradations in our algorithms lead to, for example, longer rider pick-up times, offering a more precise alternative to average treatment effects (ATEs). This increased precision in analyzing the effects of experiments then allow us to refine the mechanics behind estimated times of arrival (ETAs) and other metrics in a more targeted way, leading to an improved rider experience on our platform.

*If tackling some of the industry’s biggest data science challenges interests you, consider applying for a* *role on our team* *!*

*Acknowledgments*

*Akshay Jetli, Stephan Langer, and Yash Desai were instrumental for the technical implementation of the ideas discussed in this article. In addition, I profited from many helpful discussions with Sergey Gitlin.*

## Regression diagnostics: testing the assumptions of linear regression

There are **four principal assumptions** which justify the use of linear regression models for purposes of inference or prediction:

**(i) linearity** **and additivity** of the relationship between dependent and independent variables:

(a) The expected value of dependent variable is a straight-line function of each independent variable, holding the others fixed.

(b) The slope of that line does not depend on the values of the other variables.

(c) The effects of different independent variables on the expected value of the dependent variable are additive.

**(ii) statistical independence** of the errors (in particular, no correlation between consecutive errors in the case of time series data)

**(iii) homoscedasticity** (constant variance) of the errors

(a) versus time (in the case of time series data)

(c) versus any independent variable

**(iv) normality** of the error distribution.

If any of these assumptions is violated (i.e., if there are nonlinear relationships between dependent and independent variables or the errors exhibit correlation, heteroscedasticity, or non-normality), then the forecasts, confidence intervals, and scientific insights yielded by a regression model may be (at best) inefficient or (at worst) seriously biased or misleading. More details of these assumptions, and the justification for them (or not) in particular cases, is given on the introduction to regression page.

Ideally your statistical software will automatically provide charts and statistics that test whether these assumptions are satisfied for any given model. Unfortunately, many software packages do not provide such output by default (additional menu commands must be executed or code must be written) and some (such as Excel’s built-in regression add-in) offer only limited options. RegressIt does provide such output and in graphic detail. See this page for an example of output from a model that violates all of the assumptions above, yet is likely to be accepted by a naïve user on the basis of a large value of R-squared, and see this page for an example of a model that satisfies the assumptions reasonably well, which is obtained from the first one by a nonlinear transformation of variables. The normal quantile plots from those models are also shown at the bottom of this page.

You will sometimes see additional (or different) assumptions listed, such as “the variables are measured accurately” or “the sample is representative of the population”, etc. These are important considerations in any form of statistical modeling, and they should be given due attention, although they do not refer to properties of the linear regression equation per se. (Return to top of page.)

**Violations of linearity or additivity** are extremely serious: if you fit a linear model to data which are nonlinearly or nonadditively related, your predictions are likely to be seriously in error, especially when you extrapolate beyond the range of the sample data.

**How to diagnose** : nonlinearity is usually most evident in a plot of **observed versus predicted** **values** or a plot of **residuals versus predicted values** , which are a part of standard regression output. The points should be symmetrically distributed around a diagonal line in the former plot or around horizontal line in the latter plot, with a roughly constant variance. (The residual-versus-predicted-plot is better than the observed-versus-predicted plot for this purpose, because it eliminates the visual distraction of a sloping pattern.) Look carefully for evidence of a "bowed" pattern, indicating that the model makes systematic errors whenever it is making unusually large or small predictions. In multiple regression models, nonlinearity or nonadditivity may also be revealed by systematic patterns in plots of the **residuals versus individual independent variables**.

**How to fix:** consider applying a *nonlinear transformation* to the dependent and/or independent variables *if* you can think of a transformation that seems appropriate. (Don’t just make something up!) For example, if the data are strictly positive, the log transformation is an option. (The logarithm base does not matter--all log functions are same up to linear scaling--although the natural log is usually preferred because small changes in the natural log are equivalent to percentage changes. See these notes for more details.) If a log transformation is applied to the dependent variable only, this is equivalent to assuming that it grows (or decays) exponentially as a function of the independent variables. If a log transformation is applied to *both* the dependent variable and the independent variables, this is equivalent to assuming that the effects of the independent variables are *multiplicative* rather than additive in their original units. This means that, on the margin, a small *percentage* change in one of the independent variables induces a proportional *percentage* change in the expected value of the dependent variable, other things being equal. Models of this kind are commonly used in modeling price-demand relationships, as illustrated on the beer sales example on this web site.

Another possibility to consider is adding *another regressor* that is a nonlinear function of one of the other variables. For example, if you have regressed Y on X, and the graph of residuals versus predicted values suggests a parabolic curve, then it may make sense to regress Y on both X and X^2 (i.e., X-squared). The latter transformation is possible even when X and/or Y have negative values, whereas logging is not. Higher-order terms of this kind (cubic, etc.) might also be considered in some cases. But don’t get carried away! This sort of "polynomial curve fitting" can be a nice way to draw a smooth curve through a wavy pattern of points (in fact, it is a trend-line option on scatterplots on Excel), but it is usually a terrible way to extrapolate outside the range of the sample data.

Finally, it may be that you have overlooked some *entirely different independent variable* that explains or corrects for the nonlinear pattern or interactions among variables that you are seeing in your residual plots. In that case the shape of the pattern, together with economic or physical reasoning, may suggest some likely suspects. For example, if the strength of the linear relationship between Y and X_{1} depends on the level of some other variable X_{2}, this could perhaps be addressed by creating a new independent variable that is the product of X_{1} and X_{2}. In the case of time series data, if the trend in Y is believed to have changed at a particular point in time, then the addition of a *piecewise linear* trend variable (one whose string of values looks like 0, 0, …, 0, 1, 2, 3, … ) could be used to fit the kink in the data. Such a variable can be considered as the product of a trend variable and a dummy variable. Again, though, you need to beware of overfitting the sample data by throwing in artificially constructed variables that are poorly motivated. At the end of the day you need to be able to interpret the model and explain (or sell) it to others. (Return to top of page.)

**Violations of independence** are potentially very serious in *time series regression* models: serial correlation in the errors (i.e., correlation between consecutive errors or errors separated by some other number of periods) means that there is room for improvement in the model, and extreme serial correlation is often a symptom of a badly mis-specified model. Serial correlation (also known as autocorrelation”) is sometimes a byproduct of a violation of the linearity assumption, as in the case of a simple (i.e., straight) trend line fitted to data which are growing exponentially over time.

Independence can also be violated in non-time-series models if errors tend to always have the same sign under particular conditions, i.e., if the model systematically underpredicts or overpredicts what will happen when the independent variables have a particular configuration.

**How to diagnose:** The best test for serial correlation is to look at a **residual time series plot** (residuals vs. row number) and a table or plot of **residual autocorrelations** . (If your software does not provide these by default for time series data, you should figure out where in the menu or code to find them.) Ideally, most of the residual autocorrelations should fall within the 95% confidence bands around zero, which are located at roughly plus-or-minus 2-over-the-square-root-of-n, where n is the sample size. Thus, if the sample size is 50, the autocorrelations should be between +/- 0.3. If the sample size is 100, they should be between +/- 0.2. Pay especially close attention to significant correlations at the first couple of lags and in the vicinity of the seasonal period, because these are probably not due to mere chance and are also fixable. The *Durbin-Watson statistic* provides a test for significant residual autocorrelation at lag 1: the DW stat is approximately equal to 2(1-a) where a is the lag-1 residual autocorrelation, so ideally it should be close to 2.0--say, between 1.4 and 2.6 for a sample size of 50.

**How to fix:** Minor cases of *positive* serial correlation (say, lag-1 residual autocorrelation in the range 0.2 to 0.4, or a Durbin-Watson statistic between 1.2 and 1.6) indicate that there is some room for fine-tuning in the model. Consider adding lags of the dependent variable and/or lags of some of the independent variables. Or, if you have an ARIMA+regressor procedure available in your statistical software, try adding an AR( 1) or MA(1) term to the regression model. An AR( 1) term adds a lag of the dependent variable to the forecasting equation, whereas an MA(1) term adds a lag of the forecast error. If there is significant correlation at lag 2, then a 2nd-order lag may be appropriate.

If there is significant *negative* correlation in the residuals (lag-1 autocorrelation more negative than -0.3 or DW stat greater than 2.6), watch out for the possibility that you may have *overdifferenced* some of your variables. Differencing tends to drive autocorrelations in the negative direction, and too much differencing may lead to artificial patterns of negative correlation that lagged variables cannot correct for.

If there is significant correlation at the *seasonal* period (e.g. at lag 4 for quarterly data or lag 12 for monthly data), this indicates that seasonality has not been properly accounted for in the model. Seasonality can be handled in a regression model in one of the following ways: (i) *seasonally adjust* the variables (if they are not already seasonally adjusted), or (ii) use *seasonal lags and/or seasonally differenced variables* (caution: be careful not to overdifference!), or (iii) add *seasonal dummy variables* to the model (i.e., indicator variables for different seasons of the year, such as MONTH=1 or QUARTER=2, etc.) The dummy-variable approach enables *additive seasonal adjustment* to be performed as part of the regression model: a different additive constant can be estimated for each season of the year. If the dependent variable has been logged, the seasonal adjustment is multiplicative. (Something else to watch out for: it is possible that although your dependent variable is already seasonally adjusted, some of your independent variables may not be, causing their seasonal patterns to leak into the forecasts.)

*Major cases* of serial correlation (a Durbin-Watson statistic well below 1.0, autocorrelations well above 0.5) usually indicate a fundamental structural problem in the model. You may wish to reconsider the transformations (if any) that have been applied to the dependent and independent variables. It may help to stationarize all variables through appropriate combinations of differencing, logging, and/or deflating.

To test for non-time-series violations of independence, you can look at plots of the residuals versus independent variables or plots of residuals versus row number in situations where the rows have been sorted or grouped in some way that depends (only) on the values of the independent variables. The residuals should be randomly and symmetrically distributed around zero under all conditions, and in particular **there should be no correlation between consecutive errors no matter how the rows are sorted**, as long as it is on some criterion that does not involve the dependent variable. If this is not true, it could be due to a violation of the linearity assumption or due to bias that is explainable by omitted variables (say, interaction terms or dummies for identifiable conditions).

**Violations of homoscedasticity** (which are called "heteroscedasticity") make it difficult to gauge the true standard deviation of the forecast errors, usually resulting in confidence intervals that are too wide or too narrow. In particular, if the variance of the errors is increasing over time, confidence intervals for out-of-sample predictions will tend to be unrealistically narrow. Heteroscedasticity may also have the effect of giving too much weight to a small subset of the data (namely the subset where the error variance was largest) when estimating coefficients.

**How to diagnose:** look at a plot of **residuals versus predicted values** and, in the case of time series data, a plot of **residuals versus time** . Be alert for evidence of residuals that grow larger either as a function of time or as a function of the predicted value. To be really thorough, you should also generate plots of residuals versus independent variables to look for consistency there as well. Because of imprecision in the coefficient estimates, the errors may tend to be *slightly* larger for forecasts associated with predictions or values of independent variables that are extreme in both directions, although the effect should not be too dramatic. What you hope *not* to see are errors that systematically get larger in one direction by a significant amount.

**How to fix:** If the dependent variable is strictly positive and if the residual-versus-predicted plot shows that the size of the errors is proportional to the size of the predictions (i.e., if the errors seem consistent in percentage rather than absolute terms), a log transformation applied to the dependent variable may be appropriate. In time series models, heteroscedasticity often arises due to the effects of inflation and/or real compound growth. Some combination of *logging and/or deflating* will often stabilize the variance in this case. Stock market data may show periods of increased or decreased volatility over time. This is normal and is often modeled with so-called ARCH (auto-regressive conditional heteroscedasticity) models in which the error variance is fitted by an autoregressive model. Such models are beyond the scope of this discussion, but a simple fix would be to work with shorter intervals of data in which volatility is more nearly constant. Heteroscedasticity can also be a byproduct of a significant violation of the linearity and/or independence assumptions, in which case it may also be fixed as a byproduct of fixing those problem.

*Seasonal patterns* in the data are a common source of heteroscedasticity in the errors: unexplained variations in the dependent variable throughout the course of a season may be consistent in percentage rather than absolute terms, in which case larger errors will be made in seasons where activity is greater, which will show up as a seasonal pattern of changing variance on the residual-vs-time plot. A log transformation is often used to address this problem. For example, if the seasonal pattern is being modeled through the use of dummy variables for months or quarters of the year, a log transformation applied to the dependent variable will convert the coefficients of the dummy variables to multiplicative adjustment factors rather than additive adjustment factors, and the errors in predicting the logged variable will be (roughly) interpretable as percentage errors in predicting the original variable. Seasonal adjustment of all the data prior to fitting the regression model might be another option.

If a log transformation has already been applied to a variable, then (as noted above) additive rather than multiplicative seasonal adjustment should be used, if it is an option that your software offers. Additive seasonal adjustment is similar in principle to including dummy variables for seasons of the year. Whether-or-not you should perform the adjustment outside the model rather than with dummies depends on whether you want to be able to study the seasonally adjusted data all by itself and on whether there are unadjusted seasonal patterns in some of the independent variables. (The dummy-variable approach would address the latter problem.) (Return to top of page.)

**Violations of normality** create problems for determining whether model coefficients are significantly different from zero and for calculating confidence intervals for forecasts. Sometimes the error distribution is "skewed" by the presence of a few large outliers. Since parameter estimation is based on the minimization of *squared* error, a few extreme observations can exert a disproportionate influence on parameter estimates. Calculation of confidence intervals and various significance tests for coefficients are all based on the assumptions of normally distributed errors. If the error distribution is significantly non-normal, confidence intervals may be too wide or too narrow.

Technically, the normal distribution assumption is not necessary if you are willing to assume the model equation is correct and your only goal is to estimate its coefficients and generate predictions in such a way as to minimize mean squared error. The formulas for estimating coefficients require no more than that, and some references on regression analysis do not list normally distributed errors among the key assumptions. But generally we are interested in making inferences about the model and/or estimating the probability that a given forecast error will exceed some threshold in a particular direction, in which case distributional assumptions are important. Also, a significant violation of the normal distribution assumption is often a "red flag" indicating that there is some other problem with the model assumptions and/or that there are a few unusual data points that should be studied closely and/or that a better model is still waiting out there somewhere.

**How to diagnose:** the best test for normally distributed errors is a **normal probability plot** or **normal quantile plot** of the residuals. These are plots of the fractiles of error distribution versus the fractiles of a normal distribution having the same mean and variance. If the distribution is normal, the points on such a plot should fall close to the diagonal reference line. A *bow-shaped* pattern of deviations from the diagonal indicates that the residuals have excessive *skewness* (i.e., they are not symmetrically distributed, with too many large errors in *one* direction). An S-shaped pattern of deviations indicates that the residuals have excessive *kurtosis*--i.e., there are either too many or two few large errors in *both* directions. Sometimes the problem is revealed to be that there are a few data points on one or both ends that deviate significantly from the reference line ("outliers"), in which case they should get close attention.

There are also a variety of **statistical tests for normality**, including the Kolmogorov-Smirnov test, the Shapiro-Wilk test, the Jarque-Bera test, and the Anderson-Darling test. The Anderson-Darling test (which is the one used by RegressIt) is generally considered to be the best, because it is specific to the normal distribution (unlike the K-S test) and it looks at the whole distribution rather than just the skewness and kurtosis (like the J-B test). But all of these tests are excessively "picky" in this author’s opinion. Real data rarely has errors that are perfectly normally distributed, and it may not be possible to fit your data with a model whose errors do not violate the normality assumption at the 0.05 level of significance. It is usually better to focus more on violations of the other assumptions and/or the influence of a few outliers (which may be mainly responsible for violations of normality anyway) and to look at a normal probability plot or normal quantile plot and draw your own conclusions about whether the problem is serious and whether it is systematic.

Here is an example of a bad-looking normal quantile plot (an S-shaped pattern with P=0 for the A-D stat, indicating highly significant non-normality) from the beer sales analysis on this web site:

…and here is an example of a good-looking one (a linear pattern with P=0.5 for the A-D stat, indicating no significant departure from normality):

**How to fix:** violations of normality often arise either because (a) the *distributions of the dependent and/or independent variables* are themselves significantly non-normal, and/or (b) the *linearity assumption* is violated. In such cases, a nonlinear transformation of variables might cure both problems. In the case of the two normal quantile plots above, the second model was obtained applying a natural log transformation to the variables in the first one.

The dependent and independent variables in a regression model do not need to be normally distributed by themselves--only the prediction errors need to be normally distributed. (In fact, independent variables do not even need to be random, as in the case of trend or dummy or treatment or pricing variables.) But if the distributions of some of the variables that are random are extremely asymmetric or long-tailed, it may be hard to fit them into a linear model whose errors will be normally distributed, and explaining the shape of their distributions may be an interesting topic all by itself. Keep in mind that the normal error assumption is usually justified by appeal to the central limit theorem, which holds in the case where many random variations are added together. If the underlying sources of randomness are not interacting additively, this argument fails to hold.

Another possibility is that there are two or more *subsets* of the data having *different statistical properties*, in which case separate models should be built, or else some data should merely be excluded, provided that there is some a priori criterion that can be applied to make this determination.

## Results

The benchmark model predicted 16.6% of the variance in post-treatment HRSD. The ensemble model predicted an additional 8% of the variance in post-treatment HRSD (see Table 1). The most important predictor variables were pre-treatment depression total score, psychiatric comorbidity, dysthymia, several depression symptom items, usage of several Deprexis modules, disability, treatment credibility, and availability of therapists (see Fig. 2).

Fig. 2. Partial dependence plots for the top 16 predictors of post-treatment interviewer-rated depression symptoms.

Table 1. Prediction of post-treatment depression by linear regression model including only pre-treatment assessment of outcome (benchmark), additional variance explained beyond benchmark model by ensemble model (model gain), and total variance explained.

95% CIs for prediction *R* 2 were based on the standard error formula applied to the 10 × 10 cross-validation estimates 95% CIs for gain (the increase in predicted *R* 2 over benchmark) were estimated by bootstrap.

Not surprisingly, partial dependence plots indicated a fairly linear relationship for pre-treatment HRSD and dysthymia as pre-treatment depression/dysthymia increased, so did predicted post-treatment HRSD. Psychiatric comorbidity had a more curvilinear relationship, as post-treatment HRSD gently increased with increasing comorbidity until relatively high levels of comorbidity where post-treatment HRSD increased much more quickly. Higher levels of specific symptoms of depression, including slowness, psychic anxiety, and weight loss, were associated with higher post-treatment depression disability related to psychiatric symptoms had a similar association.

Notably, usage of the relationships, acceptance, and relaxation modules were identified as important predictors. Using these modules for at least 30 min was associated with a 1.2-point greater reduction in HRSD score (all other predictors being equal), which is approximately one-quarter of the mean outcome difference observed for Deprexis-treated *v.* wait-list groups. As can be seen in many of the partial dependence plots for the 16 highest impact predictors (Fig. 2), associations between predictors and post-treatment HRSD were often non-linear and effects were relatively small (with the exception of the first three variables). Footnote 3 Importance scores are also presented separately for the random forests and elastic net models in online Supplementary materials, section 6.0.

### Symptom-related disability

Consistent with prior work, to create a disability outcome, the work, social, and family disability questions (three items in total) from the SDS were averaged to form a single index of symptom-related disability. The benchmark model with pre-treatment disability predicted 20.4% of the variance in post-treatment symptom-related disability. The ensemble model predicted an additional 5% of the variance in post-treatment disability (Table 1).

As can be seen in Fig. 3, the pre-treatment disability composite had the strongest importance score, which was approximately equivalent to the importance of disability in the family domain. Nevertheless, several other variables also contributed to the prediction of disability. Several QIDS-SR items were identified as important predictors, including disinterest, early insomnia, and fatigue. More time spent on both the relaxation and cognitive modules (the benefit tapered off after approximately 60 min on each module) were both associated with lower post-treatment disability. Higher percentage of zip code with Hispanic ethnicity and fewer years in therapy were also associated with lower disability. Importance scores are also presented separately for the random forests and elastic net models in online Supplementary materials, section 6.1.

Fig. 3. Partial dependence plots for the top 16 predictors of post-treatment disability.

### Depression-related well-being

The benchmark model with pre-treatment well-being (positive affect) predicted 17.8% of the variance in post-treatment symptom-related well-being. The ensemble model explained an additional 11.6% of the variance in post-treatment well-being (Table 1). As can be seen in Fig. 4, not surprisingly, higher pre-treatment well-being was associated with higher post-treatment well-being.

Fig. 4. Partial dependence plots for the top 16 predictors of post-treatment well-being (low positive affect) symptoms.

Comorbid psychopathology was also an important predictor of post-treatment well-being. The most important forms of comorbidity included mania symptoms, dysthymia, disinterest, and ill temper. Higher perceived treatment credibility and greater confidence that treatment would help were both associated with greater improvements in well-being in a fairly linear fashion (see Fig. 4). Younger age was associated with a better outcome as was paternal anxiety and mental illness. Higher use of the relaxation module was also associated with better post-treatment well-being. Importance scores are also presented separately for the random forests and elastic net models in online Supplementary materials, section 6.2.

### Deprexis module usage

A final plot highlights the relative impact of module usage for each of the three outcomes (see Fig. 5). To generate these scores, the module importance scores for each outcome were scaled to sum to 1. A few notable patterns emerge. First, the most important module appeared to be the relaxation module greater usage was associated with fewer depression symptoms, less disability, and more well-being. In addition, usage of the acceptance and relationship modules were most important for the prediction of HRSD depression symptoms. The cognitive module was important for predicting reductions in disability, as was the diagnosis module. Time spent on most of the other modules was not strongly associated with symptom improvement, at least for the average user. It is important to note that usage of all modules did factor (weakly) into the final prediction, and modules that were not important for the average user might nonetheless be very important when predicting the outcomes for some individuals.

Fig. 5. Importance of Deprexis module usage for predicting post-treatment depression, disability, and well-being (positive affect).

## Statistical Modeling, Causal Inference, and Social Science

I happened to come across this article today. It’s hardly obscure—it has over 3000 citations, according to Google scholar—but it was new to me.

It’s a wonderful article. You should read it right away.

OK, click on the above link and read the article.

Done? OK, then read on.

You know that saying, that every good idea in statistics was published fifty years earlier in psychometrics? That’s what’s happening here. Cronbach talks about the importance of interactions, the difficulty of estimating them from data, the way in which researchers manage to find what they’re looking for, even in settings where the data are too weak to really show such patterns, he even talks about the piranha problem in the context of “Aptitude x Treatment interactions”:

In a world where researchers are babbling on about so-called moderators and mediators as if they know what they’re doing, Cronbach is a voice of sanity.

And this was all fifty years ago! All this sounds a lot like Meehl, and Meehl is great, but Cronbach adds value by giving lots of specific applied examples.

In the article, Cronbach makes a clear connection between interactions and the replication crisis arising from researcher degrees of freedom, a point that I rediscovered years later—in my paper on the connection between varying treatment effects and the crisis of unreplicable research. Too bad I hadn’t been aware of this work earlier.

Hmmm . . . let me check the Simmons, Nelson, and Simonsohn (2011) article that introduced the world to the useful term “researcher degrees of freedom”: Do they cite Cronbach? No. Interesting that, even psychology researchers were unaware of that important work in psychometrics. I’m not slamming Simons et al.—I hadn’t known about Cronbach either!—I’m just noting that, even within psychology, his work was not so well known.

Going through the papers that referred to Cronbach (1975), I came across this book chapter from Denny Borsboom, Rogier A. Kievit, Daniel Cervone and S. Brian Hood, which begins:

Anybody who has some familiarity with the research literature in scientific psychology has probably thought, at one time or another, ‘Well, all these means and correlations are very interesting, but what do they have to do with me, as an individual person?’. The question, innocuous as it may seem, is a deep and complicated one. In contrast to the natural sciences, where researchers can safely assume that, say, all electrons are exchangeable save properties such as location and momentum, people differ from each other. . . .

The problem permeates virtually every subdiscipline of psychology, and in fact may be one of the reasons that progress in psychology has been limited.

Given the magnitude of the problems involved in constructing person-specific theories and models, let alone in testing them, it is not surprising that scholars have sought to integrate inter-inter-individual differences and intra-individual dynamics in a systematic way. . . .

The call for integration of research traditions dates back at least to Cronbach’s (1957) . . .:

Correlational psychology studies only variance among organisms experimental psychology studies only variance among treatments. A united discipline will study both of these, but it will also be concerned with the otherwise neglected interactions between organismic and treatment variables . . .

Not much has changed in the basic divisions in scientific psychology since Cronbach (1957) wrote his presidential address. True, today we have mediation and moderation analyses, which attempt to integrate inter-individual differences and intra-individual process, and in addition are able to formulate random effects models that to some extent incorporate inter-individual differences in an experimental context but by and large research designs are characterized by a primary focus on the effects of experimental manipulations or on the structure associations of inter-individual differences, just as was the case in 1957. . . .

In experimental research, the researcher typically hopes to demonstrate the existence of causal effects of experimental manipulations (which typically form the levels of the ‘independent variable’) on a set of properties which are treated as dependent on the manipulations (their levels form the ‘dependent variable’). . . .

One interesting and very general fact about experimental research is that such claims are never literally true. The literal reading of conclusions like Bargh et al., very prevalent among untrained readers of scientific work, is that all participants in the experimental condition were slower than all those in the control condition. But that, of course, is incorrect – otherwise there would be no need for the statistics. . . .

From a statistical perspective, it is commonplace to speak of an average treatment effect. But, when considered from the perspective of understanding human behavior, it’s a big deal that effects typically appear only in the aggregate and not on individuals.

The usual story we tell is that the average treatment effect (which we often simply call “the treatment effect”) is real—indeed, we often model it as constant across people and over time—and then we label deviations from this average as “noise.”

But I’ve increasingly come to the conclusion that we need to think of treatment effects as varying: thus, the difficulty in estimating treatment effects is *not* merely a problem of “finding a signal in noise” which can be solved by increasing our sample size rather, it is a fundamental challenge.

To use rural analogies, when we’re doing social and behavioral science, we’re not looking for a needle in a haystack rather, we’re trying to catch a slippery fish that keeps moving.

All this is even harder in political science, economics, or sociology. An essential aspect of *social* science is that it understands people not in isolation but within groups. Thus, if psychology ultimately requires a different model for each person (or a model that accounts for differences between people), the social sciences require a different model for each configuration of people (or a model that accounts for dependence of outcomes on the configuration).

To put it another way, if any theory of psychology implies 7,700,000,000 theories (corresponding to the population of the world today, and for now ignoring models of people who are no longer alive), then political science, economics, etc. imply 2^7,700,000,000 – 1 theories (corresponding to all possible subsets of the population, excluding the empty set, for which no social science is necessary). That’s an extreme statement—obviously we work with much simpler theories that merely have implications for each individual or each subset of the population—but the point is that such theories are either explicit or implied in any model of social science that is intended to have general application.

## References

Allen, N. B., & Badcock, P. B. (2003). The social risk hypothesis of depressed mood: Evolutionary, psychosocial, and neurobiological perspectives. *Psychological Bulletin, 129*(6), 887–913. https://doi.org/10.1037/0033-2909.129.6.887.

American Association for Public Opinion Research. (2011). Standard definitions: Final dispositions of case codes and outcome rates for surveys. Retrieved from http://www.aapor.org/AM/Template.cfm?Section=Standard_Definitions2&Template=/CM/ContentDisplay.cfm&ContentID=3156

Bandura, A. (1986). *Social foundations of thoughts and action: A social cognitive theory*. Englewood Cliffs: Prentice-Hall.

Bandura, A. (2009). Social cognitive theory of mass communication. In J. Bryant & M. B. Oliver (Eds.), *Media effects* (3rd ed., pp. 94–124). New York: Lawrence Erlbaum.

Baumeister, R. F., & Leary, M. R. (1995). The need to belong: Desire for interpersonal attachments as a fundamental human motivation. *Psychological Bulletin, 117*(3), 497–529. https://doi.org/10.1037/0033-2909.117.3.497.

Bavishi, A., Slade, M. D., & Levy, B. R. (2016). A chapter a day: Association of book reading with longevity. *Social Science & Medicine, 164*, 44–48. https://doi.org/10.1016/j.socscimed.2016.07.014.

Beck, A. T. (1967). *Depression: Causes and treatment*. Philadelphia: University of Pennsylvania Press.

Beck, A. T. (1987). Cognitive models of depression. *Journal of Cognitive Psychotherapy: An International Quarterly, 1*(1), 5–37.

Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). *Cognitive therapy of depression*. New York: Guilford.

Becker, M. W., Alzahabi, R., & Hopwood, C. J. (2013). Media multitasking is associated with symptoms of depression and social anxiety. *Cyberpsychology, Behavior and Social Networking, 16*(2), 132–135. https://doi.org/10.1089/cyber.2012.0291.

Bickham, D. S., Hswen, Y., & Rich, M. (2015). Media use and depression: Exposure, household rules, and symptoms among young adolescents in the USA. *International Journal of Public Health, 60*(2), 147–155. https://doi.org/10.1007/s00038-014-0647-6.

Breunig, C., & Engel, B. (2015). Massenkommunikation 2015: Funktionen und images der Medien im Vergleich [mass communication 2015: Functions and images of the media]. *Media Perspektiven, 7-8*, 323–341.

Brinkmann, K., & Franzen, J. (2015). Depression and self-regulation: A motivational analysis and insights from effort-related cardiovascular reactivity. In G. H. E. Gendolla, M. Tops, & S. L. Koole (Eds.), *Handbook of biobehavioral approaches to self-regulation* (pp. 333–347). New York, NY: Springer New York.

Burke, M., & Kraut, R. (2016). The relationship between Facebook use and well-being depends on communication type and tie strength. *Journal of Computer-Mediated Communication, 21*(4), 265–281. https://doi.org/10.1111/jcc4.12162.

Cameron, E. M., & Ferraro, F. R. (2004). Body satisfaction in college women after brief exposure to magazine images. *Perceptual and Motor Skills, 98*(3), 1093–1099. https://doi.org/10.2466/pms.98.3.1093-1099.

Carpenter, S. (2017). Ten steps in scale development and reporting: A guide for researchers. Communication Methods and Measures, Advance online publication, 1–20. https://doi.org/10.1080/19312458.2017.1396583.

Cotten, S. R., Ford, G., Ford, S., & Hale, T. M. (2012). Internet use and depression among older adults. *Computers in Human Behavior, 28*(2), 496–499. https://doi.org/10.1016/j.chb.2011.10.021.

Dilling, H., Mombour, W., & Schmidt, M. H. (2011). *Internationale Klassifikation psychischer Störungen [international classification of mental disorders]*. Bern: Huber.

Draper, N., & Smith, H. (1998). *Applied regression analysis*. New York: Wiley.

Dunn, J. C., Whelton, W. J., & Sharpe, D. (2012). Retreating to safety: Testing the social risk hypothesis model of depression. *Evolution and Human Behavior, 33*(6), 746–758. https://doi.org/10.1016/j.evolhumbehav.2012.06.002.

Dutta-Bergman, M. (2005). Depression and news gathering after September 11: The interplay of affect and cognition. *Communication Research Reports, 22*(1), 7–14. https://doi.org/10.1080/0882409052000343471.

Escobar-Viera, C. G., Shensa, A., Bowman, N. D., Sidani, J. E., Knight, J., James, A. E., & Primack, B. A. (2018). Passive and active social media use and depressive symptoms among United States adults. *Cyberpsychology, Behavior, and Social Networking, 21*(7), 437–443. https://doi.org/10.1089/cyber.2017.0668.

Fardouly, J., Pinkus, R. T., & Vartanian, L. R. (2017). The impact of appearance comparisons made through social media, traditional media, and in person in women’s everyday lives. *Body Image, 20*, 31–39. https://doi.org/10.1016/j.bodyim.2016.11.002.

Frison, E., & Eggermont, S. (2015a). Exploring the relationships between different types of Facebook use, perceived online social support, and adolescents’ depressed mood. *Social Science Computer Review, 34*(2), 153–171. https://doi.org/10.1177/0894439314567449.

Frison, E., & Eggermont, S. (2015b). Toward an integrated and differential approach to the relationships between loneliness, different types of Facebook use, and adolescents’ depressed mood. *Communication Research*, 0093650215617506. https://doi.org/10.1177/0093650215617506.

Fuchs, C., & Diamantopoulos, A. (2009). Using single-item measures for construct measurement in management research: Conceptual issues and application guidelines. *Business Administration Review, 69*(2), 195–210.

George, M. J., Russell, M. A., Piontak, J. R., & Odgers, C. L. (2018). Concurrent and subsequent associations between daily digital technology use and high-risk adolescents’ mental health symptoms. *Child Development, 89*(1), 78–88. https://doi.org/10.1111/cdev.12819.

Goldfield, G. S., Murray, M., Maras, D., Wilson, A. L., Phillips, P., Kenny, G. P., . Sigal, R. J. (2016). Screen time is associated with depressive symptomatology among obese adolescents: A HEARTY study. *European Journal of Pediatrics, 175*(7), 909–919. https://doi.org/10.1007/s00431-016-2720-z.

Harvey, S. B., Hotopf, M., Øverland, S., & Mykletun, A. (2010). Physical activity and common mental disorders. *The British Journal of Psychiatry, 197*(5), 357–364. https://doi.org/10.1192/bjp.bp.109.075176.

Houghton, S., Lawrence, D., Hunter, S. C., Rosenberg, M., Zadow, C., Wood, L., & Shilton, T. (2018). Reciprocal relationships between trajectories of depressive symptoms and screen media use during adolescence. *Journal of Youth and Adolescence*. https://doi.org/10.1007/s10964-018-0901-y.

Katz, E., Blumler, J. G., & Gurevitch, M. (1973). Uses and gratifications research. *Public Opinion Quarterly, 37*(4), 509–523. https://doi.org/10.1086/268109.

Kraut, R., Patterson, M., Lundmark, V., Kiesler, S., Mukophadhyay, T., & Scherlis, W. (1998). Internet paradox: A social technology that reduces social involvement and psychological well-being? *American Psychologist, 53*(9), 1017–1031. https://doi.org/10.1037/0003-066X.53.9.1017.

LaRose, R., & Eastin, M. S. (2004). A social cognitive theory of internet uses and gratifications: Toward a new model of media attendance. *Journal of Broadcasting & Electronic Media, 48*(3), 358–377. https://doi.org/10.1207/s15506878jobem4803_2.

Lee, E. W. J., Ho, S. S., & Lwin, M. O. (2016). Explicating problematic social network sites use: A review of concepts, theoretical frameworks, and future directions for communication theorizing. *New Media & Society, 19*(2), 308–326. https://doi.org/10.1177/1461444816671891.

Lim, G. Y., Tam, W. W., Lu, Y., Ho, C. S., Zhang, M. W., & Ho, R. C. (2018). Prevalence of depression in the community from 30 countries between 1994 and 2014. *Scientific Reports, 8*(1), 2861. https://doi.org/10.1038/s41598-018-21243-x.

Lin, L. Y., Sidani, J. E., Shensa, A., Radovic, A., Miller, E., Colditz, J. B., . Primack, B. A. (2016). Associations between social media use and depression among U.S. young adults. *Depression and Anxiety, 33*(4), 323–331. https://doi.org/10.1002/da.22466.

Liu, M., Wu, L., & Yao, S. (2015). Dose–response association of screen time-based sedentary behaviour in children and adolescents and depression: A meta-analysis of observational studies. *British Journal of Sports Medicine, 50*, 1–8. https://doi.org/10.1136/bjsports-2015-095084.

Lucas, M., Mekary, R., Pan, A., Mirzaei, F., O’Reilly, É. J., Willett, W. C., . Ascherio, A. (2011). Relation between clinical depression risk and physical activity and time spent watching television in older women: A 10-year prospective follow-up study. *American Journal of Epidemiology, 174*(9), 1017–1027. https://doi.org/10.1093/aje/kwr218.

Maddux, J. E., & Meier, L. J. (1995). Self-efficacy and depression. In J. E. Maddux (Ed.), *Self-efficacy, adaptation, and adjustment: Theory, research, and application* (pp. 143–169). Boston: Springer US.

Mares, M.-L., & Cantor, J. (1992). Elderly viewers' responses to televised portrayals of old age: Empathy and mood management versus social comparison. *Communication Research, 19*(4), 459–478. https://doi.org/10.1177/009365092019004004.

Minnebo, J. (2005). Psychological distress, perceived social support, and television viewing for reasons of companionship: A test of the compensation hypothesis in a population of crime victims. *Communications: The European Journal of Communication Research, 30*(2), 233–250. https://doi.org/10.1515/comm.2005.30.2.233.

Morgan, C., & Cotten, S. R. (2003). The relationship between internet activities and depressive symptoms in a sample of college freshmen. *Cyberpsychology & Behavior, 6*(2), 133–142. https://doi.org/10.1089/109493103321640329.

Nimrod, G. (2017). Older audiences in the digital media environment. *Information, Communication & Society, 20*(2), 233–249. https://doi.org/10.1080/1369118X.2016.1164740.

Nowakowski, M. E., Atkey, S. K., & Antony, M. M. (2015). Self-help/bibliotherapy. In R. L. Cautin & S. O. Lilienfeld (Eds.), *The encyclopedia of clinical psychology* (pp. 1–8). Chichester: Wiley.

Perloff, R. M., Quarles, R. C., & Drutz, M. (1983). Loneliness, depression and the uses of television. *Journalism and Mass Communication Quarterly, 60*(2), 352–356.

Pew Research Center. (2012). Assessing the representativeness of public opinion surveys. Retrieved from http://www.people-press.org/2012/05/15/assessing-the-representativeness-of-public-opinion-surveys

Peytchev, A., & Neely, B. (2013). RDD telephone surveys: Toward a single-frame cell-phone design. *Public Opinion Quarterly, 77*(1), 283–304. https://doi.org/10.1093/poq/nft003.

Pforr, K., Blohm, M., Blom, A. G., Erdel, B., Felderer, B., Fräßdorf, M., . Rammstedt, B. (2015). Are incentive effects on response rates and nonresponse bias in large-scale, face-to-face surveys generalizable to Germany? Evidence from ten experiments. *Public Opinion Quarterly, 79*(3), 740–768. https://doi.org/10.1093/poq/nfv014.

Potter, F. (1990). A study of procedures to identify and trim extreme sampling weights. *Proceedings of the Survey Research Methods Section*, 225–230. Retrieved from http://www.amstat.org/sections/srms/Proceedings/papers/1990_034.pdf

Potts, R., & Sanchez, D. (1994). Television viewing and depression: No news is good news. *Journal of Broadcasting & Electronic Media, 38*(1), 79–90. https://doi.org/10.1080/08838159409364247.

Pratt, L. A., & Brody, D. J. (2014). *Depression in the U.S. Household Population, 2009–2012*. Retrieved from https://www.cdc.gov/nchs/data/databriefs/db172.pdf.

Primack, B. A., Swanier, B., Georgiopoulos, A. M., Land, S. R., & Fine, M. J. (2009). Association between media use in adolescence and depression in young adulthood. *Archives of General Psychiatry, 66*(2), 181–188. https://doi.org/10.1001/archgenpsychiatry.2008.532.

Primack, B. A., Shensa, A., Sidani, J. E., Whaite, E. O., Lin, L. y., Rosen, D., . Miller, E. (2017). Social media use and perceived social isolation among young adults in the U.S. *American Journal of Preventive Medicine, 53*(1), 1–8. https://doi.org/10.1016/j.amepre.2017.01.010.

Primack, B. A., Bisbey, M. A., Shensa, A., Bowman, N. D., Karim, S. A., Knight, J. M., & Sidani, J. E. (2018). The association between valence of social media experiences and depressive symptoms. *Depression and Anxiety, 35*(8), 784–794. https://doi.org/10.1002/da.22779.

Przybylski, A. K., & Weinstein, N. (2017). A large-scale test of the goldilocks hypothesis: Quantifying the relations between digital-screen use and the mental well-being of adolescents. *Psychological Science, 28*(2), 204–215. https://doi.org/10.1177/0956797616678438.

Quinn, K. (2018). Cognitive effects of social media use: A case of older adults. *Social Media + Society, 4*(3), 1–9. https://doi.org/10.1177/2056305118787203.

Reinemann, C., & Scherr, S. (2011). Der Werther-Defekt: Plädoyer für einen neuen Blick auf den Zusammenhang von suizidalem Verhalten und Medien [The Werther defect: Plea for a new view on the association between suicidal behavior and the media]. *Publizistik, 56*(1), 89–94. https://doi.org/10.1007/s11616-010-0109-y

Ridder, C.-M., & Engel, B. (2010). Massenkommunikation 2010: Mediennutzung im Intermediavergleich. *Media Perspektiven, 11*, 523–536.

Rottenberg, J., Kasch, K. L., Gross, J. J., & Gotlib, I. H. (2002). Sadness and amusement reactivity differentially predict concurrent and prospective functioning in major depressive disorder. *Emotion, 2*(2), 135–146. https://doi.org/10.1037/1528-3542.2.2.135.

Rubin, A. M. (2009). Uses-and-gratifications perspective on media effects. In J. Bryant & M. B. Oliver (Eds.), *Media effects* (3rd ed., pp. 165–184). New York: Lawrence Erlbaum.

Scherr, S., & Reinemann, C. (2011). Belief in a Werther effect: Third-person effects in the perceptions of suicide risk for others and the moderating role of depression. *Suicide and Life-Threatening Behavior, 41*(6), 624–634. https://doi.org/10.1111/j.1943-278X.2011.00059.x

Scherr, S. (2013). Medien und Suizide: Überblick über die kommunikationswissenschaftliche Forschung zum Werther-Effekt [Media and suicide: Review of research on the Werther effect in communication science]. *Suizidprophylaxe, 40*(3), 96–107.

Scherr, S. (2016). Depression – Medien – Suizid: Zur empirischen Relevanz von Depressionen und Medien für die Suizidalität [Depression – Media – Suicide: On the empirical relevance of depression and media for suicidality]. Wiesbaden: Springer VS.

Scherr, S., & Brunet, A. (2017). Differential influences of depression and personality traits on the use of Facebook. *Social Media + Society, 3*(1), 1–14. https://doi.org/10.1177/2056305117698495

Scherr, S., Toma, C. L., & Schuster, B. (2018). Depression as a predictor of Facebook surveillance and envy: Longitudinal evidence from a cross-lagged panel study in Germany. *Journal of Media Psychology.* Advance online publication. https://doi.org/10.1027/1864-1105/a000247

Schmitt, M., Altstötter-Gleich, C., Hinz, A., Maes, J., & Brähler, E. (2006). Normwerte für das Vereinfachte Beck-Depressions-Inventar (BDI-V) in der Allgemeinbevölkerung. *Diagnostica, 52*(2), 51–59. https://doi.org/10.1026/0012-1924.52.2.51.

Schmitt, M., Hübner, A., & Maes, J. (2010). Validierung des Vereinfachten Beck-depressions-Inventars (BDI-V) an Fremdeinschätzungen [validation of the short version of the Beck depression inventory (BDI-V) with external assessment]. *Diagnostica, 56*(3), 125–132. https://doi.org/10.1026/0012-1924/a000019.

Seabrook, E. M., Kern, M. L., & Rickard, N. S. (2016). Social networking sites, depression, and anxiety: A systematic review. *Journal of Medical Internet Research: Mental Health, 3*(4), e50. https://doi.org/10.2196/mental.5842.

Selfhout, M. H. W., Branje, S. J. T., Delsing, M., ter Bogt, T. F. M., & Meeus, W. H. J. (2009). Different types of internet use, depression, and social anxiety: The role of perceived friendship quality. *Journal of Adolescence, 32*(4), 819–833. https://doi.org/10.1016/j.adolescence.2008.10.011.

Shannon, C. E., & Weaver, W. (1964). *The mathematical theory of communication*. Urbana: University of Illinois Press.

Shensa, A., Escobar-Viera, C. G., Sidani, J. E., Bowman, N. D., Marshal, M. P., & Primack, B. A. (2017). Problematic social media use and depressive symptoms among U.S. young adults: A nationally-representative study. *Social Science & Medicine, 182*, 150–157. https://doi.org/10.1016/j.socscimed.2017.03.061.

Sigerson, L., & Cheng, C. (2018). Scales for measuring user engagement with social network sites: A systematic review of psychometric properties. *Computers in Human Behavior, 83*, 87–105. https://doi.org/10.1016/j.chb.2018.01.023.

Smith, A., & Anderson, M. (2018). Social media use in 2018: A majority of Americans use Facebook and YouTube, but young adults are especially heavy users of Snapchat and Instagram. Retrieved from http://www.pewinternet.org/2018/03/01/social-media-use-in-2018/

Steger, M. F., & Kashdan, T. B. (2009). Depression and everyday social activity, belonging, and well-being. *Journal of Counseling Psychology, 56*(2), 289–300. https://doi.org/10.1037/a0015416.

Sundar, S. S., & Limperos, A. M. (2013). Uses and grats 2.0: New gratifications for new media. *Journal of Broadcasting & Electronic Media, 57*(4), 504–525. https://doi.org/10.1080/08838151.2013.845827.

Thompson, R. J., Mata, J., Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Gotlib, I. H. (2010). Maladaptive coping, adaptive coping, and depressive symptoms: Variations across age and depressive state. *Behaviour Research and Therapy, 48*(6), 459–466. https://doi.org/10.1016/j.brat.2010.01.007.

Tromholt, M. (2016). The Facebook experiment: Quitting Facebook leads to higher levels of well-being. *Cyberpsychology, Behavior, and Social Networking, 19*(11), 661–666. https://doi.org/10.1089/cyber.2016.0259.

Twenge, J. M., Joiner, T. E., Rogers, M. L., & Martin, G. N. (2017). Increases in depressive symptoms, suicide-related outcomes, and suicide rates among U.S. adolescents after 2010 and links to increased new media screen time. *Clinical Psychological Science, 6*(1), 3–17. https://doi.org/10.1177/2167702617723376.

Valenzuela, S., Park, N., & Kee, K. F. (2009). Is there social capital in a social network site?: Facebook use and college student's life satisfaction, trust, and participation. *Journal of Computer-Mediated Communication, 14*(4), 875–901. https://doi.org/10.1111/j.1083-6101.2009.01474.x

Valkenburg, P. M., & Peter, J. (2007). Online communication and adolescent well-being: Testing the stimulation versus the displacement hypothesis. *Journal of Computer-Mediated Communication, 12*(4), 1169–1182. https://doi.org/10.1111/j.1083-6101.2007.00368.x.

Wittchen, H.-U., Jacobi, F., Rehm, J., Gustavsson, A., Svensson, M., Jönsson, B., . Steinhausen, H.-C. (2011). The size and burden of mental disorders and other disorders of the brain in Europe 2010. *European Neuropsychopharmacology, 21*(9), 655–679. https://doi.org/10.1016/j.euroneuro.2011.07.018.

Zillmann, D. (1988a). Mood management through communication choices. *American Behavioral Scientist, 31*, 327–340.

Zillmann, D. (1988b). Mood management: Using entertainment to full advantage. In L. A. Donohew, H. E. Sypher, & E. T. Higgins (Eds.), *Communication, social cognition, and affect* (pp. 147–172). Hillsdale: Sage.

The analyze function, available in the psycho package, transforms a model fit object into user-friendly outputs.

Summarizing an analyzed object returns a dataframe, that can be easily saved and included in reports. It also includes standardized coefficients, as well as bootstrapped confidence intervals (CI) and effect sizes.

Variable | Coef | SE | t | df | Coef.std | SE.std | p | Effect_Size | CI_lower | CI_higher |
---|---|---|---|---|---|---|---|---|---|---|

(Intercept) | 25.52 | 4.24 | 6.02 | 31.50 | 0.00 | 0.00 | < .001*** | Very Small | 17.16 | 33.93 |

Emotion_ConditionNeutral | 6.14 | 2.67 | 2.30 | 895.13 | 0.10 | 0.04 | < .05* | Very Small | 0.91 | 11.37 |

Subjective_Valence | 0.06 | 0.03 | 1.68 | 898.47 | 0.09 | 0.06 | = 0.09° | Very Small | -0.01 | 0.12 |

Emotion_ConditionNeutral:Subjective_Valence | 0.16 | 0.05 | 3.22 | 896.27 | 0.13 | 0.04 | < .01** | Very Small | 0.06 | 0.26 |

## Linear versus nonlinear classifiers

In this section, we show that the two learning methods Naive Bayes and Rocchio are instances of linear classifiers, the perhaps most important group of text classifiers, and contrast them with nonlinear classifiers. To simplify the discussion, we will only consider two-class classifiers in this section and define a linear classifier as a two-class classifier that decides class membership by comparing a linear combination of the features to a threshold.

In two dimensions, a linear classifier is a line. Five examples are shown in Figure 14.8 . These lines have the functional form . The classification rule of a linear classifier is to assign a document to if b$ --> and to if . Here, is the two-dimensional vector representation of the document and is the parameter vector that defines (together with ) the decision boundary. An alternative geometric interpretation of a linear classifier is provided in Figure 15.7 (page ).

We can generalize this 2D linear classifier to higher dimensions by defining a hyperplane as we did in Equation 140, repeated here as Equation 144:

The assignment criterion then is: assign to if b$ --> and to if . We call a hyperplane that we use as a linear classifier a decision hyperplane .

The corresponding algorithm for linear classification in dimensions is shown in Figure 14.9 . Linear classification at first seems trivial given the simplicity of this algorithm. However, the difficulty is in training the linear classifier, that is, in determining the parameters and based on the training set. In general, some learning methods compute much better parameters than others where our criterion for evaluating the quality of a learning method is the effectiveness of the learned linear classifier on new data.

We now show that Rocchio and Naive Bayes are linear classifiers. To see this for Rocchio, observe that a vector is on the decision boundary if it has equal distance to the two class centroids:

Some basic arithmetic shows that this corresponds to a linear classifier with normal vector and (Exercise 14.8 ).

We can derive the linearity of Naive Bayes from its decision rule, which chooses the category with the largest (Figure 13.2 , page 13.2 ) where:

and is the number of tokens in the document that are part of the vocabulary. Denoting the complement category as , we obtain for the log odds:

We choose class if the odds are greater than 1 or, equivalently, if the log odds are greater than 0. It is easy to see that Equation 147 is an instance of Equation 144 for , number of occurrences of in , and . Here, the index , , refers to terms of the vocabulary (not to positions in as does cf. variantmultinomial) and and are -dimensional vectors. So in log space, Naive Bayes is a linear classifier.

prime | 0.70 | 0 | 1 | dlrs | -0.71 | 1 | 1 |

rate | 0.67 | 1 | 0 | world | -0.35 | 1 | 0 |

interest | 0.63 | 0 | 0 | sees | -0.33 | 0 | 0 |

rates | 0.60 | 0 | 0 | year | -0.25 | 0 | 0 |

discount | 0.46 | 1 | 0 | group | -0.24 | 0 | 0 |

bundesbank | 0.43 | 0 | 0 | dlr | -0.24 | 0 | 0 |

Worked example. Table 14.4 defines a linear classifier for the category interest in Reuters-21578 (see Section 13.6 , page 13.6 ). We assign document ``rate discount dlrs world'' to interest since 0= b$ --> . We assign ``prime dlrs'' to the complement class (not in interest) since . For simplicity, we assume a simple binary vector representation in this example: 1 for occurring terms, 0 for non-occurring terms. End worked example.

A linear problem with noise. In this hypothetical web page classification scenario, Chinese-only web pages are solid circles and mixed Chinese-English web pages are squares. The two classes are separated by a linear class boundary (dashed line, short dashes), except for three noise documents (marked with arrows).

Figure 14.10 is a graphical example of a linear problem , which we define to mean that the underlying distributions and of the two classes are separated by a line. We call this separating line the class boundary . It is the ``true'' boundary of the two classes and we distinguish it from the decision boundary that the learning method computes to approximate the class boundary.

As is typical in text classification, there are some noise documents in Figure 14.10 (marked with arrows) that do not fit well into the overall distribution of the classes. In Section 13.5 (page 13.5 ), we defined a noise feature as a misleading feature that, when included in the document representation, on average increases the classification error. Analogously, a noise document is a document that, when included in the training set, misleads the learning method and increases classification error. Intuitively, the underlying distribution partitions the representation space into areas with mostly homogeneous class assignments. A document that does not conform with the dominant class in its area is a noise document.

Noise documents are one reason why training a linear classifier is hard. If we pay too much attention to noise documents when choosing the decision hyperplane of the classifier, then it will be inaccurate on new data. More fundamentally, it is usually difficult to determine which documents are noise documents and therefore potentially misleading.

If there exists a hyperplane that perfectly separates the two classes, then we call the two classes linearly separable . In fact, if linear separability holds, then there is an infinite number of linear separators (Exercise 14.4 ) as illustrated by Figure 14.8 , where the number of possible separating hyperplanes is infinite.

Figure 14.8 illustrates another challenge in training a linear classifier. If we are dealing with a linearly separable problem, then we need a criterion for selecting among all decision hyperplanes that perfectly separate the training data. In general, some of these hyperplanes will do well on new data, some will not.

An example of a nonlinear classifier is kNN. The nonlinearity of kNN is intuitively clear when looking at examples like Figure 14.6 . The decision boundaries of kNN (the double lines in Figure 14.6 ) are locally linear segments, but in general have a complex shape that is not equivalent to a line in 2D or a hyperplane in higher dimensions.

Figure 14.11 is another example of a nonlinear problem: there is no good linear separator between the distributions and because of the circular ``enclave'' in the upper left part of the graph. Linear classifiers misclassify the enclave, whereas a nonlinear classifier like kNN will be highly accurate for this type of problem if the training set is large enough.

If a problem is nonlinear and its class boundaries cannot be approximated well with linear hyperplanes, then nonlinear classifiers are often more accurate than linear classifiers. If a problem is linear, it is best to use a simpler linear classifier.

- Prove that the number of linear separators of two classes is either infinite or zero.

## METHODS

This meta-review aimed to systematically aggregate the most recent, top-tier evidence for the role of “lifestyle factors” in the prevention and treatment of mental disorders, following the PRISMA statement to ensure comprehensive and transparent reporting 19 . Systematic searches were conducted on February 3, 2020 of the following databases: Allied and Complementary Medicine (AMED), PsycINFO, Ovid MEDLINE, Health Management Information Consortium, EMBASE and the NHS Economic Evaluation and Health Technology Assessment databases.

The following PICOS search algorithm was used: Participants [‘mental health or psychological well-being or psychological outcomes or mental well-being or psychiat* or mental illness* or mental disorder* or depress* or mood disorder* or affective disorder* or anxi* or panic or obsessive compulsive or OCD or ADHD or attention deficit or attentional deficit or phobi* or bipolar type or bipolar disorder* or psychosis or psychotic or schizophr* or schizoaffective or antipsychotic* or post traumatic* or personality disorder* or stress disorder* or dissociative disorder or antidepress* or antipsychotic*.ti] Interventions/Exposures [physical activity or exercis* or sport* or walking or intensity activity or resistance training or muscle or sedentary or screen time or screentime or aerobic or fitness or diet* or nutri* or food* or vegan or vege* or meat or carbohy* or fibre or sugar* or adipos* or vitamin* or fruit* or sleep* or insomn* or circad* or smoke* or smoking or tobacco or nicotine or healthy or obes* or weight or bodyweight or body mass or BMI or health behav* or behavior change or behavior change or lifestyle*.ti] Outcomes [‘meta-analy* or metaanaly* or meta reg* or metareg* or systematic review* or Mendel* or meta-review or reviews or umbrella review or updated review*.ti] Study design [‘prospective or protect* or inciden* or onset or prevent* or cohort or predict* or risk or longitudinal or randomized or randomised or mendel* or bidirectional or controlled or trial* or causal'].

Separate searches of the Cochrane Database of Systematic Reviews and Google Scholar were also conducted to identify additional articles.

### Eligibility criteria

The lifestyle factors examined were those pertaining to physical activity, diet, sleep and smoking.

“Physical activity” was considered in the broadest sense, including overall physical activity levels, structured exercise training interventions, and also studies examining the absence of physical activity, i.e. sedentary behavior. “Diet” focused on dietary food intake/interventions, and did not include studies evaluating specific nutrient treatments (as these have been already reviewed extensively in this journal 20 ) or those examining blood levels of individual vitamins/minerals/fatty acids (as blood levels of these nutrients are influenced by many genetic and environmental factors, independent from dietary intake 21, 22 ). “Sleep” was examined as general sleep patterns, quality or quantity, along with studies examining either the impact of sleep disorders (i.e., insomnia) on risk of mental illnesses, or the efficacy of non-pharmacological interventions directly targeting sleep to improve psychiatric symptoms. The term “smoking” was used only in reference to tobacco consumption, from personal usage or passive exposure, rather than illicit drugs, as the known psychoactive effects of these latter substances have been reviewed extensively in this journal 23 .

Mental disorders eligible to be included in this meta-review were mood disorders (moderate or severe depression and bipolar disorder), psychotic disorders (including schizophrenia and related conditions), anxiety and stress-related disorders, dissociative disorders, personality disorders, and ADHD. We excluded psychiatric conditions which are directly characterized by adverse health behaviors (i.e., eating disorders and alcohol or substance use disorders) along with other neurodevelopmental disorders (e.g., autism, intellectual disability) and neurodegenerative disorders (e.g., dementia), as these were considered beyond the scope of this review.

Protective factors were examined using two sources of data. First, we searched for meta-analyses of longitudinal data that examined relationships between the various lifestyle factors and prospective risk/onset of mental illness. Eligible meta-analyses were those presenting suitable quantitative data – as adjusted or raw odds ratios (ORs), risk ratios (RRs) or hazard ratios (HRs) – on how baseline status of behavioral variables influences the prospective risk of mental illness, including diagnosed psychiatric conditions and clinically significant symptoms (using established cutoffs on validated screening instruments, or based on percentile cutoffs of psychiatric symptom scores).

The second source of data used for examining protective factors were any Mendelian randomization (MR) studies of the link between lifestyle factors and mental illness. Briefly, MR is a causal inference method that can be used to estimate the effect of an exposure (X) on an outcome (Y) whilst minimizing bias from confounding and reverse causation 24, 25 . Suitable genetic instruments (usually single nucleotide polymorphisms, SNPs) are identified through genome-wide association studies (GWAS). Individuals carrying the effect allele of the variant have higher (or lower) levels of X on average than those without the effect alleles. Following Mendel's laws of segregation and independent assortment, the genetic variants are inherited randomly at conception, and are inherited independently of confounding lifestyle factors 26 . Therefore, MR can be considered somewhat analogous to a randomized controlled trial (RCT) of behavioral factors in the prevention of mental illness, as genetic variants randomly predispose individuals to experience different levels of these factors 26 . As genes also remain unchanged throughout the life course, they are also not altered by the outcome of interest, thus reducing bias from reverse causation 26 . Therefore, while meta-analyses of prospective cohort studies are useful for identifying the overall strength and directionality of associations, the MR analyses were used to further infer the causal nature of the observed relationships.

The evidence for lifestyle interventions in the treatment of people with diagnosed mental disorders was examined using two different sources of data, but both based on meta-analyses of RCTs (typically considered the top-tier of evidence in health intervention research). First, we searched for existing meta-reviews of meta-analyses of RCTs published in the last five years, for each lifestyle factor, providing quantitative effects of physical activity, diet, smoking cessation or non-pharmacological sleep interventions on psychiatric symptoms in people with mental illness. Second, for the lifestyle factors that were not covered within the existing meta-reviews, we sought out meta-analyses of RCTs examining their impact (using the search strategy above), and synthesized the evidence from the meta-analyses using a methodology derived from a previous meta-review 20 . For meta-analyses with mixed samples, only those in which at least 75% of the sample examined the eligible mental illnesses (as described above) were included.

### Data extraction

A systematic tool was applied to each eligible meta-analysis/MR study to extract the relevant data on the association of lifestyle factors with risk of mental illness, or the effects of lifestyle interventions on psychiatric outcomes. Results of eligible meta-reviews were extracted narratively, summarized from their respective articles.

For meta-analyses of longitudinal studies, the strength and direction of the prospective associations between lifestyle factors and mental illness were quantified categorically, and thus extracted as ORs, HRs or RRs, with 95% confidence intervals (CIs).

For meta-analyses of RCTs of lifestyle interventions in mental illness, effect size data were quantified as a continuous variable (i.e., magnitude of effect on psychiatric symptoms) and thus extracted as standardized mean differences (SMDs), Cohen's d or Hedges' g. These were then classified as small (<0.4), moderate (0.4-0.8), or large (>0.8).

For all meta-analyses, data on the degree of between-study heterogeneity (quantified as I 2 values) were also extracted, where reported.

In cases where multiple eligible meta-analyses examined a specific lifestyle factor in the risk/treatment of the same mental disorder, the most recent was used preferentially. Where older meta-analyses featured >25% more studies than the newer versions and contained important, novel findings from unique analyses not captured in the most recent versions, these were also extracted and presented alongside the newer findings. In cases where two MR studies had examined the same lifestyle factor for the same mental health outcome, both studies (regardless of recency or sample size) were included and reviewed.

We also extracted relevant study characteristics where reported, including number of pooled comparisons within meta-analyses (n), sample size (N), details on the specifics of lifestyle exposure or intervention examined, and sample features. The results of key subgroup/sensitivity analyses showing how different age groups, illnesses or outcomes examined, or different types of exposure/interventions modified the effect of the specific lifestyle factor were extracted as well. For the purposes of providing a concise summary of the literature, only the findings from secondary analyses which provided important, unique insights into the evidence were extracted.

### Quality assessment of included studies

The National Institutes of Health (NIH)'s Quality Assessment Tool for Systematic Reviews and Meta-Analyses was used to assess the quality of the included meta-analyses. This tool evaluates the quality of meta-analyses rating them for adequacy of the search question, specification of inclusion and exclusion criteria, systematic search, screening of papers, quality assessment and summaries of included studies, and tests for publication bias and heterogeneity. In accordance with previous meta-reviews using the NIH tool 27 , the quality of included meta-analyses was categorized as “good” (7 or 8), “fair” (4-6), or “poor” (0-3).

As no consensus tool exists for determining the quality of MR and meta-review studies, these were omitted from formal quality assessment.

## Pharmacokinetics made easy 9: Non-linear pharmacokinetics

**What is meant by non-linear pharmacokinetics? **When the dose of a drug is increased, we expect that the concentration at steady state will increase proportionately, i.e. if the dose rate is increased or decreased say two-fold, the plasma drug concentration will also increase or decrease two-fold. However, for some drugs, the plasma drug concentration changes either more or less than would be expected from a change in dose rate. This is known as non-linear pharmacokinetic behaviour and can cause problems when adjusting doses.

What causes non-linear pharmacokinetic behaviour?

In a previous article (Article 1 - `Clearance' Aust Prescr 198811:12-3), it was shown that the steady state blood concentration (C_{ss}) is a function of both the dose and the clearance of the drug.

C _{ss} = | F x dose rate clearance |

where F is the bioavailability.

In most dosing situations, total clearance (CL) is determined by protein binding and intrinsic clearance (CL_{int}) (Article 4 - `How drugs are cleared by the liver' Aust Prescr 199013:88-9).

where f_{u} is the fraction unbound to protein.

Combining equations 1 and 2, the determinants of C_{ss} during chronic dosing are

C _{ss} = | F x dose rate f _{u} x CL_{int} |

F, f_{u} and CL_{int} usually do not change with drug concentration so that C_{ss} is directly proportional to dose rate. However, there are some situations where this predictable relationship between dose rate and C_{ss} breaks down due to dose dependency of F, f_{u} and/or CL_{int}.

1. Saturation of elimination mechanisms causes a change in intrinsic clearance

* Drug metabolism*The metabolism of drugs is carried out by a variety of enzymes such as cytochrome P450 and N-acetyltransferase. The dependence of the rate of an enzyme reaction on substrate concentration is given by the Michaelis-Menten equation and is illustrated in Fig. 1

v = | V _{max} x S K _{m} + S |

where v is the velocity of reaction, S is the substrate concentration, V_{max} is the maximum velocity at very high substrate concentrations and K_{m} is the substrate concentration at half V_{max}. Km is a measure of the affinity of the substrate for the enzyme.

In pharmacokinetic terms, v is equivalent to the rate of elimination (v = C_{u} x CL) and S is equivalent to the unbound drug concentration (C_{u}). Equation 4 can then be rearranged to give a function for intrinsic clearance (see also equation 1).

CL _{int} = | V C _{u} | = | V _{max} K _{m} + C_{u} |

where V_{max} is the maximum rate of metabolism at high concentrations of unbound drug and K_{m} is the unbound drug concentration at half V_{max}.

Usually, unbound plasma drug concentration (C_{u}) in the therapeutic range is very small compared to the K_{m} for the metabolising enzyme and equation 5 approximates to

**equation 6 **

CL _{int} = | V _{max} K _{m} |

CL_{int} is then independent of unbound drug concentration which is therefore linear with dose. In some cases, unbound drug concentration is close to or above K_{m} at therapeutic doses, and the kinetics begin to become non-linear (see

Fig. 1). In this situation, CL_{int} decreases as unbound drug concentration increases (see equation 5) and steady state drug concentration increases more than proportionately with dose (equation 3). *At high drug concentrations, the maximal rate of metabolism is reached and cannot be exceeded. Under these conditions, a constant amount of drug is eliminated per unit time no matter how much drug is in the body. Zero order kinetics then apply rather than the usual first order kinetics where a constant* *proportion**of the drug in the body is eliminated per unit time.* Some examples of drugs which exhibit non-linear kinetic behaviour are phenytoin, ethanol, salicylate and, in some individuals, theophylline.

*Phenytoin* : Phenytoin exhibits marked saturation of metabolism at concentrations in the therapeutic range (10-20 mg/L) (Fig. 2). Consequently, small increases in dose result in large increases in total and unbound steady state drug concentration. As an example, for a patient with typical K_{m} of 5 mg/L (total drug) and V_{max} of 450 mg/day, steady state concentrations at doses of 300, 360 and 400 mg/day would be 10.0, 20.0 and 40.0 mg/L respectively (Fig. 2). Thus, small dosage adjustments are required to achieve phenytoin concentrations in the therapeutic range of 10-20 mg/L.

A second consequence is that, because clearance decreases, apparent half-life increases from about 12 hours at low phenytoin concentrations to as long as a week or more at high concentrations. This means that

i. the time to reach steady state can be as long as 1-3 weeks at phenytoin concentrations near the top of the therapeutic range

ii. in the therapeutic range, the phenytoin concentration fluctuates little over a 24 hour period allowing once daily dosing and sampling for drug concentration monitoring at any time between doses

iii. if dosing is stopped with concentrations in the toxic range, phenytoin concentration initially falls very slowly and there may be little change over a number of days.

*Alcohol*: Alcohol is an interesting example of saturable metabolism. The Km for alcohol is about 0.01 g% (100 mg/L) so that concentrations in the range of pharmacological effect are well above the K_{m}. The V_{max} for ethanol metabolism is about 10 g/hour (12.8 mL/hour) and it can be calculated (see legend to Fig. 2) that at the common legal driving limit of 0.05 g%, the rate of alcohol metabolism per hour is 8.3 g/hour. This amount of alcohol is contained in 530 mL light beer, 236 mL standard beer, 88 mL wine or 27 mL spirit. Higher rates of ingestion will result in further accumulation.

* Renal excretion*In Article 7 (`Clearance of drugs by the kidneys' Aust Prescr 199215:16-9), it was shown that renal drug clearance is the sum of filtration clearance plus secretion clearance minus reabsorption. Clearance by glomerular filtration is a passive process which is not saturable, but secretion involves saturable drug binding to a carrier. Even when secretion is saturated, filtration continues to increase linearly with plasma drug concentration. The extent to which saturation of renal secretion results in non-linear pharmacokinetics depends on the relative importance of secretion and filtration in the drug's elimination. Because of the baseline of filtration clearance, saturation of renal secretion does not usually cause clinically important problems.

2. Saturation of first pass metabolism causing an increase in bioavailability

After oral administration, the drug-metabolising enzymes in the liver are exposed to relatively high drug concentrations in the portal blood. For drugs with high hepatic extraction ratios, e.g. alprenolol, an increased dose can result in saturation of the metabolising enzymes and an increase in bioavailability (F). Steady state drug concentration then increases more than proportionately with dose (equation 3). Other drugs with saturable first pass metabolism are tropisetron and paroxetine.

3. Saturation of protein binding sites causing a change in fraction of drug unbound in plasma

The fraction unbound of a drug in plasma (f_{u}) is given by

**1 + K** _{a}P_{u}

where Ka is the affinity constant for binding to a protein such as albumin or a1 acid glycoprotein and P_{u} is the concentration of free (unbound) protein, i.e. protein that does not have drug bound to it. The total concentration of albumin in plasma is about 0.6 mM (40 g/L) and the concentration of a1 acid glycoprotein is about 0.015 mM. Usually drug concentrations are well below those of the binding proteins and unbound protein (P_{u}) approximates to total protein (P_{T}). Then, fu depends only on the affinity constant and the total concentration of protein binding sites, and remains constant with changes in drug concentration. In a few cases (e.g. salicylate, phenylbutazone, diflunisal), therapeutic drug concentrations are high enough to start to saturate albumin binding sites so that unbound protein concentration decreases and f_{u} increases while total drug concentration increases less than proportionately with increases in dose (equation 3). This occurs more commonly for drugs such as disopyramide which bind to a1 acid glycoprotein because of the lower concentration of binding protein.

What are the practical consequences of saturable protein binding? From equation 3, it can be seen that as f_{u} increases, total drug concentration at steady state decreases. However, f_{u} does not affect the steady state concentration of the unbound drug. In other words, unbound concentration will increase linearly with dose, but total drug concentration will increase less than proportionately. This is illustrated in Fig. 3 for the case of disopyramide. This dissociation between total and unbound drug concentration causes difficulties in therapeutic drug monitoring where total drug concentration is nearly always measured. Total drug concentration may appear to plateau despite increasing dose (Fig. 3) leading to further dose increases. However, unbound concentrations and drug effect do increase linearly with dose - if this is not realised, n appropriate dose increases with consequent toxicity can occur.