# Logit model with unequal observation time-

Prompted by a article by King and Zeng, many researchers worry about whether they can legitimately use conventional logistic regression for data in which events are rare. Although King and Zeng accurately described the problem and proposed an appropriate solution, there are still a lot of misconceptions about this issue. The problem is not specifically the rarity of events, but rather the possibility of a small number of cases on the rarer of the two outcomes. If you have a sample size of but only 20 events, you have a problem. If you have a sample size of 10, with events, you may be OK.    Do you agree with this suggestion or is my unbalanced sample OK Adult potty punishment there are enough observations in the smaller group? What is the variable limit for inclusion in my model? Thanks, and so sorry for the late reply. Being an observational study, these predictors are unbalanced, and the exposures could range from the hundreds to the millions. Is the second issue regarding incidental parameters problem really of concern? The Firth method can also be helpful with convergence failures in Cox regression, although these are less common than in logistic regression. Wald p-values can be very inaccurate. Well, I think you have enough events to do some useful analysis.

## Busty brunette wife. Introduction

A single-layer neural network computes a continuous output instead of a step function. Wuth Dekker, Inc. The practical implication of this separation of the likelihood function is that the former process can Logit model with unequal observation time ignored when making likelihood-based inferences about the latter. Beginning in SAS 9. Others have found results that are not consistent with the above, using different criteria. An intuition for this comes from the fact that, since we choose based on the maximum of two values, Free masturbation spycam their difference matters, not the exact uneqkal — and this effectively removes one degree of freedom. Permissions Icon Permissions. Alternatively, exact estimation can be done using permutation methods. How to fit it: Beginning in SAS 9. When the likelihood is intractable, one alternative is the generalized estimating equations GEE approach Liang and Zeger, Wiht du Pasquierbut they gave him little credit and did not adopt his terminology. In anova parlance this design has both between-subject and within-subject effects, i.

Would like to begin this ML journey with this message:.

• There are many types of models in the area of logistic modeling.
• Repeated measures data comes in two different formats: 1 wide or 2 long.
• Garrett M.
• This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc

Prompted by a article by King and Zeng, many researchers worry about whether they can legitimately use conventional logistic regression for data in which events are rare. Although King and Zeng accurately described the problem and proposed an appropriate solution, there are still a lot of misconceptions about this issue.

The problem is not specifically the rarity of events, but rather the possibility of a small number of cases on the rarer of the two outcomes. If you have a sample size of but only 20 events, you have a problem. If you have a sample size of 10, with events, you may be OK. The problem is that maximum likelihood estimation of the logistic model is well-known to suffer from small-sample bias.

And the degree of bias is strongly dependent on the number of cases in the less frequent of the two categories. So even with a sample size of ,, if there are only 20 events in the sample, you may have substantial bias. King and Zeng proposed an alternative estimation method to reduce the bias. Their method is very similar to another method, known as penalized likelihood, that is more widely available in commercial software.

Also called the Firth method, after its inventor, penalized likelihood is a general approach to reducing small-sample bias in maximum likelihood estimation. In the case of logistic regression, penalized likelihood also has the attraction of producing finite, consistent estimates of regression parameters when the maximum likelihood estimates do not even exist because of complete or quasi-complete separation.

Unlike exact logistic regression another estimation method for small samples but one that can be very computationally intensive , penalized likelihood takes almost no additional computing time compared to conventional maximum likelihood.

In fact, a case could be made for always using penalized likelihood rather than conventional maximum likelihood for logistic regression, regardless of the sample size. Does anyone have a counter-argument? Reference: Gary King and Langche Zeng. I am thinking to use Poisson regression in case where event is rare, since p probability of success is very small and n sample size is large.

This has no advantage over logistic regression. Better to use exact logistic regression if computationally practical or the Firth method. Can you please explain further why you say Poisson regression has no advantage over logistic regression when we have rare events?

When events are rare, the Poisson distribution provides a good approximation to the binomial distribution. If you have 50 events for observations, will using the firth option the appropriate one if your goal is to not only model likelihood but also the median time to event?

The Firth method can be helpful in reducing small-sample bias in Cox regression, which can arise when the number of events is small. The Firth method can also be helpful with convergence failures in Cox regression, although these are less common than in logistic regression.

Which method would be appropriate, multiple logistic or poisson regression? There is no reason to consider Poisson regression. For logistic regression, I would use the Firth method. If the response variable is presence of testicular cancer and one of the covariates is sex, for example.

The rarity of the event reduces the power of this test. I fully agree with Paul Allison. We have done extensive simulation studies with small samples, comparing the Firth method with ordinary maximum likelihood estimation. Regarding point estimates, the Firth method was always superior to ML. Furthermore, it turned out that confidence intervals based on the profile penalized likelihood were more reliable in terms of coverage probability than those based on standard errors.

Profile penalized likelihood confidence intervals are available, e. Hi, I am a phD student at biostatistics. I have a data set with approximately cases where there are only events. I used the method of weighting for rare events in Gary King article.

My goal was to estimate ORs in a logistic regression,unfortunetly standard errors and confidence intervals are big , and there is a little difference with usual logistic regression.

I dont no why, what is your idea? My guess is that penalized likelihood will give you very similar results. But the effective sample size here is a lot closer to than it is to 26, So you may simply not have enough events to get reliable estimates of the odds ratios. Please clear me this. I have the sample of observations with equal number of good and bads. Is it good way of building the model or should I reduce the bads. There would be nothing to gain in doing that, and you want to use all the data you have.

If the event I am analyzing is extremely rare 1 in but the available sample is large 5 million such that there are events in the sample, would logistic regression be appropriate? There are about independent variables that are of interest to us in understanding the event. If an even larger sample would be needed, how much larger should it be at a minimum? Yes, logistic regression should be fine in this situation.

Again, what matters is the number of the rarer event, not the proportion. I have a small data set patients , with only 25 events. Because the dataset is small, I am able to do an exact logistic regression. A few questions…. Is there a variable limit for inclusion in my model? Does the rule that is often suggested still apply? Thus, no more than 5 predictors in your regression. I benefited a lot from your explanation of Exact logistic regression and I read your reply on this comment that you would relax the criteria to only 5 events per predictor instead of I am in this situation right now and I badly need your help.

I will have to be able to defend that and I wanna know if there is evidence behind the relaxed 5 events per predictor rule with exact regression? Below are two references that you might find helpful. One argues for relaxing the 10 events per predictor rule, while the other claims that even more events may be needed. Both papers focus on conventional ML methods rather than exact logistic regression. Vittinghoff, E. Courvoisier, D. Combescure, T. Agoritsas, A.

Gayet-Ageron and T. I also wanted to confirm this from you, that if I have the gender as a predictor male, female , this is considered as TWO and not one variables, right?

Thank you very much for your help. I guess I gave you a wrong example for my question. I wanted to know if a categorical variable has more than two levels, would it still be counted as one variable for the sake of the rule we are discussing? Also, do we have to stick to the 5 events per predictor if we use Firth, or can we violate the rule completely, and if it is OK to violate it, do I have to mention a limitation about that?

What matters is the number of coefficients. So a categorical variable with 5 categories would have four coefficients. Keep in mind, however, that this is only the roughest rule of thumb. But it is not sufficient to determine whether the study has adequate power to test the hypotheses of interest.

My question is this: from that group of , cases with 2, or so events, what is the appropriate sample size for analysis? A second follow up question — is it ok for my cutoff value in logistic regression to be so low around 0. Thank so much for any help you can provide! My question is, do you really need to sample? Nowadays, most software packages can easily handle , cases for logistic regression.

If you definitely want to sample, I would take all cases with events. Then take a simple random sample of the non-events. The more the better, but at least This kind of disproportionate stratified sampling on the dependent variable is perfectly OK for logistic regression see Ch. As I said in the blog post, what matters is the number of events, not the proportion. How do you then change the value of the LR probability prediction for an event, so it will reflect its probability on all the traffic or rows and not on sample of them?

Thanks Oded. There is a simple formula for adjusting the intercept. Let r be the proportion of events in the sample and let p be the proportion in the population. Let b be the intercept you estimate and B be the adjusted intercept. The formula is. I am using a data set of 86, observations to study business start-up. The most of the responses are dichotomous. I used logistic regression and result shows all 10 independent variables are highly significant.

I tried rare event and got same result. People are complaining for highly significant result and saying the result may be biased. Would you please suggest me?

Thus, we may evaluate more diseased individuals, perhaps all of the rare outcomes. Here is the joint multi-degree of freedom test for the interaction. I would want to see a stronger defense of the strategy before using it, and specific proof that xtreg, fe works ok. Repeated measure anova assumes the within-subject covariance structure is compound symmetric. When this study began in the s, many of the patients had completed doxorubicin chemotherapy as much as 14 years earlier. The simulation study results lend partial support to this conjecture.  ### Logit model with unequal observation time. The example dataset

Comment Post Cancel. Richard Williams. As I will note later, that may just be too monstrous to do what you want. I suggest looking at Allison's green sage book on fixed effects models. Version 9, released around , give or take a few decades you should not use the xi: prefix. Use factor variables instead. Type -help fvvarlist- for details. Also, use xtreg, fe instead of areg. I am sure that will be a bit challenging for Stata if it can do it at all. As Allison explains, the dummy variable approach is called Unconditional Maximum Likelihood.

It can be more or less ok for xtreg. But, as Allison and Stephen both point out, the estimates are biased in a logit analysis. Turning to your 2nd approach -- here, you are using Conditional Maximum Likelihood, which Allison says is the right way to go. Still, you may have some problems. I would suggest starting really simple, e. Maybe you will find that everything works ok until x7 is added. Analyzing a subset of the data may also help to speed up the problem solving process.

Here is an excerpt from p. Suppose there were 11, records for a case and there were 5, cases where the event occurred. That means you would have 11, choose 5, possible combos, and you would have to compare the probability of the combo that did happen with all the combos that did not happen. Maybe Stata has some super-efficient way of doing that, but if so it hasn't shown up so far in the current calculations.

So in short, my recommendations are a abandon Approach 1 -- even if it did someday run the results would be wrong b try a simplified approach 2 -- if you are lucky maybe you will find that it runs if you get rid of 1 or 2 problematic variables. If it were legitimate to greatly reduce the number of time periods, maybe it would run. Thank you Stephen and Richard. I will look at the variables more closely under Model 2. Hausman test. Regarding the monstrous number of periods, that came from the high frequency underlying data aggregated to hourly intervals.

I will try using lower frequency instead to reduce the number of periods. But just out of curiosity: Will linear probability models make any sense at all under such circumstances? I don't know if there is any work assessing the merits of a fixed effects LPM.

My intuition goes against it for one thing, fixed effects logit could be analyzing very different cases than fixed effects LPM, since FE Logit discards cases where the dependent variable does not vary across time but my intuition probably shouldn't be the decisive factor in this case. I'd be interested to hear what others say. LOL I was taught the same way. I have to admit though characteristic of Satan perhaps , it's very tempting in these cases! This formulation—which is standard in discrete choice models—makes clear the relationship between logistic regression the "logit model" and the probit model , which uses an error variable distributed according to a standard normal distribution instead of a standard logistic distribution.

Both the logistic and normal distributions are symmetric with a basic unimodal, "bell curve" shape. The only difference is that the logistic distribution has somewhat heavier tails , which means that it is less sensitive to outlying data and hence somewhat more robust to model mis-specifications or erroneous data. This model has a separate latent variable and a separate set of regression coefficients for each possible outcome of the dependent variable.

The reason for this separation is that it makes it easy to extend logistic regression to multi-outcome categorical variables, as in the multinomial logit model. In such a model, it is natural to model each possible outcome using a different set of regression coefficients.

It is also possible to motivate each of the separate latent variables as the theoretical utility associated with making the associated choice, and thus motivate logistic regression in terms of utility theory. In terms of utility theory, a rational actor always chooses the choice with the greatest associated utility.

This is the approach taken by economists when formulating discrete choice models, because it both provides a theoretically strong foundation and facilitates intuitions about the model, which in turn makes it easy to consider various sorts of extensions.

See the example below. The choice of the type-1 extreme value distribution seems fairly arbitrary, but it makes the mathematics work out, and it may be possible to justify its use through rational choice theory. It turns out that this model is equivalent to the previous model, although this seems non-obvious, since there are now two sets of regression coefficients and error variables, and the error variables have a different distribution.

In fact, this model reduces directly to the previous one with the following substitutions:. An intuition for this comes from the fact that, since we choose based on the maximum of two values, only their difference matters, not the exact values — and this effectively removes one degree of freedom. Another critical fact is that the difference of two type-1 extreme-value-distributed variables is a logistic distribution, i.

As an example, consider a province-level election where the choice is between a right-of-center party, a left-of-center party, and a secessionist party e. We would then use three latent variables, one for each choice. Then, in accordance with utility theory , we can then interpret the latent variables as expressing the utility that results from making each of the choices.

We can also interpret the regression coefficients as indicating the strength that the associated factor i. A voter might expect that the right-of-center party would lower taxes, especially on rich people. This would give low-income people no benefit, i.

On the other hand, the left-of-center party might be expected to raise taxes and offset it with increased welfare and other assistance for the lower and middle classes.

This would cause significant positive benefit to low-income people, perhaps a weak benefit to middle-income people, and significant negative benefit to high-income people.

Finally, the secessionist party would take no direct actions on the economy, but simply secede. Yet another formulation combines the two-way latent variable formulation above with the original formulation higher up without latent variables, and in the process provides a link to one of the standard formulations of the multinomial logit.

Here, instead of writing the logit of the probabilities p i as a linear predictor, we separate the linear predictor into two, one for each of the two outcomes:.

This term, as it turns out, serves as the normalizing factor ensuring that the result is a distribution. This can be seen by exponentiating both sides:. In this form it is clear that the purpose of Z is to ensure that the resulting distribution over Y i is in fact a probability distribution , i. This means that Z is simply the sum of all un-normalized probabilities, and by dividing each probability by Z , the probabilities become " normalized ".

That is:. This shows clearly how to generalize this formulation to more than two outcomes, as in multinomial logit.

Note that this general formulation is exactly the softmax function as in. In fact, it can be seen that adding any constant vector to both of them will produce the same probabilities:. As a result, we can simplify matters, and restore identifiability, by picking an arbitrary value for one of the two vectors. Note that most treatments of the multinomial logit model start out either by extending the "log-linear" formulation presented here or the two-way latent variable formulation presented above, since both clearly show the way that the model could be extended to multi-way outcomes.

In general, the presentation with latent variables is more common in econometrics and political science , where discrete choice models and utility theory reign, while the "log-linear" formulation here is more common in computer science , e.

This functional form is commonly called a single-layer perceptron or single-layer artificial neural network. A single-layer neural network computes a continuous output instead of a step function. With this choice, the single-layer neural network is identical to the logistic regression model.

This function has a continuous derivative, which allows it to be used in backpropagation. This function is also preferred because its derivative is easily calculated:. A closely related model assumes that each i is associated not with a single Bernoulli trial but with n i independent identically distributed trials, where the observation Y i is the number of successes observed the sum of the individual Bernoulli-distributed random variables , and hence follows a binomial distribution :. An example of this distribution is the fraction of seeds p i that germinate after n i are planted.

In terms of expected values , this model is expressed as follows:. In a Bayesian statistics context, prior distributions are normally placed on the regression coefficients, usually in the form of Gaussian distributions. There is no conjugate prior of the likelihood function in logistic regression. When Bayesian inference was performed analytically, this made the posterior distribution difficult to calculate except in very low dimensions.

However, when the sample size or the number of parameters is large, full Bayesian simulation can be slow, and people often use approximate methods such as variational Bayesian methods and expectation propagation. A detailed history of the logistic regression is given in Cramer The logistic function was independently developed in chemistry as a model of autocatalysis Wilhelm Ostwald , This naturally gives rise to the logistic equation for the same reason as population growth: the reaction is self-reinforcing but constrained.

They were initially unaware of Verhulst's work and presumably learned about it from L. Gustave du Pasquier , but they gave him little credit and did not adopt his terminology. In the s, the probit model was developed and systematized by Chester Ittner Bliss , who coined the term "probit" in Bliss , and by John Gaddum in Gaddum , and the model fit by maximum likelihood estimation by Ronald A.

Fisher in Fisher , as an addendum to Bliss's work. The probit model influenced the subsequent development of the logit model and these models competed with each other. By , the logit model achieved parity with the probit model in use in statistics journals and thereafter surpassed it. This relative popularity was due to the adoption of the logit outside of bioassay, rather than displacing the probit within bioassay, and its informal use in practice; the logit's popularity is credited to the logit model's computational simplicity, mathematical properties, and generality, allowing its use in varied fields.

Various refinements occurred during that time, notably by David Cox , as in Cox The multinomial logit model was introduced independently in Cox and Thiel , which greatly increased the scope of application and the popularity of the logit model.

Most statistical software can do binary logistic regression. Notably, Microsoft Excel 's statistics extension package does not include it.

From Wikipedia, the free encyclopedia. Redirected from Logit regression. Statistical model for a binary dependent variable. It is not to be confused with Logit function. This section may contain an excessive amount of intricate detail that may interest only a particular audience. Please help by spinning off or relocating any relevant information, and removing excessive detail that may be against Wikipedia's inclusion policy.

March Learn how and when to remove this template message. This section needs expansion. You can help by adding to it. October Main article: One in ten rule. Mathematics portal. Trauma Score and the Injury Severity Score". The Journal of Trauma. Journal of the American College of Surgeons. Critical Care Medicine. Freedman Statistical Models: Theory and Practice. Cambridge University Press.

Journal of Chronic Diseases. Regression Modeling Strategies 2nd ed. Strano; B. Colosimo International Journal of Machine Tools and Manufacture. Safety Science. A Applied Logistic Regression 2nd ed. Regression Modeling Strategies. Springer Series in Statistics 2nd ed. New York; Springer. Lecture Notes on Generalized Linear Models. An Introduction to Statistical Learning.

Institute for Digital Research and Education. The Cambridge Dictionary of Statistics. CS Lecture Notes : 16— Journal of Clinical Epidemiology. American Journal of Epidemiology.

Journal of Econometrics. Murphy, Kevin P. Machine Learning — A Probabilistic Perspective. The MIT Press. Econometric Analysis Fifth ed. American Statistician : — Stat Med. New York: Springer. Retrieved 3 December Retrieved Zarembka ed. Frontiers in Econometrics. New York: Academic Press.

Archived from the original PDF on New York: Cambridge University Press. Cox, David R. J Roy Stat Soc B. David ed. London: Wiley. Cramer, J. The origins of logistic regression PDF Technical report. Tinbergen Institute. Published in: Cramer, J.

Thiel, Henri International Economic Review. Wilson, E. Bibcode : PNAS Agresti, Alan. Categorical Data Analysis. New York: Wiley-Interscience. Amemiya, Takeshi Advanced Econometrics. Oxford: Basil Blackwell. Balakrishnan, N. Handbook of the Logistic Distribution. Marcel Dekker, Inc. Econometrics of Qualitative Dependent Variables. Greene, William H. Econometric Analysis, fifth edition. Prentice Hall. Hilbe, Joseph M. Logistic Regression Models. Hosmer, David Applied logistic regression.

Hoboken, New Jersey: Wiley. Howell, David C. Statistical Methods for Psychology, 7th ed. Belmont, CA; Thomson Wadsworth. Peduzzi, P. Concato; E. Kemper; T. Holford; A. Feinstein Berry, Michael J. Outline Index. Descriptive statistics. Mean arithmetic geometric harmonic Median Mode.

Central limit theorem Moments Skewness Kurtosis L-moments. Index of dispersion. Grouped data Frequency distribution Contingency table. Pearson product-moment correlation Rank correlation Spearman's rho Kendall's tau Partial correlation Scatter plot. Data collection. Sampling stratified cluster Standard error Opinion poll Questionnaire. Scientific control Randomized experiment Randomized controlled trial Random assignment Blocking Interaction Factorial experiment.

Adaptive clinical trial Up-and-Down Designs Stochastic approximation. Cross-sectional study Cohort study Natural experiment Quasi-experiment. Statistical inference. Z -test normal Student's t -test F -test. Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator.

Correlation Regression analysis. Pearson product-moment Partial correlation Confounding variable Coefficient of determination. Simple linear regression Ordinary least squares General linear model Bayesian regression. Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal.

Spectral density estimation Fourier analysis Wavelet Whittle likelihood. Nelson—Aalen estimator. Log-rank test. Cartography Environmental statistics Geographic information system Geostatistics Kriging. Category Mathematics portal Commons WikiProject. Categories : Logistic regression Prediction Regression models.

Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of the outcome is modeled as a linear combination of the predictor variables. Please note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do.

In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses. For our data analysis below, we are going to expand on Example 2 about getting into graduate school. We have generated hypothetical data, which can be obtained from our website. We will treat the variables gre and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest.

Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable while others have either fallen out of favor or have limitations. OLS regression.

When used with a binary response variable, this model is knownas a linear probability model and can be used as a way to describe conditional probabilities.

However, the errors i. Two-group discriminant function analysis. A multivariate method for dichotomous outcome variables. This will produce an overall test of significance but will not.

The i. Note that this syntax was introduced in Stata The likelihood ratio chi-square of Both gre and gpa are statistically significant, as are the three indicator variables for rank. The logistic regression coefficients give the change in the log odds of the outcome for a one unit increase in the predictor variable.

We can test for an overall effect of rank using the test command. Below we see that the overall effect of rank is statistically significant.

We can also test additional hypotheses about the differences in the coefficients for different levels of rank. Note that if we wanted to estimate this difference, we could do so using the lincom command. You could also use the logistic command. Now we can say that for a one unit increase in gpa , the odds of being admitted to graduate school versus not being admitted increase by a factor of 2.

For more information on interpreting odds ratios see our FAQ page How do I interpret odds ratios in logistic regression? You can also use predicted probabilities to help you understand the model. You can calculate predicted probabilities using the margins command, which was introduced in Stata Below we use the margins command to calculate the predicted probability of admission at each level of rank , holding all other variables in the model at their means.

For more information on using the margins command to calculate predicted probabilities, see our page Using margins for predicted probabilities. In the above output we see that the predicted probability of being accepted into a graduate program is 0. Below we generate the predicted probabilities for values of gre from to in increments of Because we have not specified either atmeans or used at … to specify values at with the other predictor variables are held, the values in the table are average predicted probabilities calculated using the sample values of the other predictor variables.

In the table above we can see that the mean predicted probability of being accepted is only 0. We may also wish to see measures of how well our model fits.

This can be particularly useful when comparing competing models. The user-written command fitstat produces a variety of fit statistics. You can find more information on fitstat by typing search fitstat see How can I use the search command to search for programs and get additional help? See also Stata help for logit Annotated output for the logistic command Interpreting logistic regression in all its forms in Adobe.

Description of the data For our data analysis below, we are going to expand on Example 2 about getting into graduate school. Percent Cum. Logistic regression, the focus of this page. Probit regression. Probit analysis will produce results similarlogistic regression. The choice of probit versus logit depends largely on individual preferences. For a more thorough discussion of these and other problems with the linear probability model, see Long , p. The log likelihood Also at the top of the output we see that all observations in our data setwere used in the analysis fewer observations would have been used if any of our variables had missing values.

For every one unit change in gre , the log odds of admission versus non-admission increases by 0. For a one unit increase in gpa , the log odds of being admitted to graduate school increases by 0. The indicator variables for rank have a slightly different interpretation.

For example, having attended an undergraduate institution with rank of 2, versus an institution with a rank of 1, decreases the log odds of admission by 0. Stata will do this computation for you if you use the or option, illustrated below. If a cell has very few cases a small cell , the model may become unstable or it might not run at all.

Separation or quasi-separation also called perfect prediction , a condition in which the outcome does not vary at some levels of the independent variables.

It is sometimes possible to estimate models for binary outcomes in datasets with only a small number of cases using exact logistic regression using the exlogistic command. For more information see our data analysis example for exact logistic regression. It is also important to keep in mind that when the outcome is rare, even if the overall dataset is large, it can be difficult to estimate a logit model. They all attempt to provide information similar to that provided by R-squared in OLS regression; however, none of them can be interpreted exactly as R-squared in OLS regression is interpreted.

For a discussion of model diagnostics for logistic regression, see Hosmer and Lemeshow , Chapter 5. Note that diagnostics done for logistic regression are similar to those done for probit regression. In Stata, values of 0 are treated as one level of the outcome variable, and all other non-missing values are treated as the second level of the outcome.  