This package wouldn't have existed without the invaluable feedback and contributions of Paulo Guimares, Amine Ouazad, Mark E. Schaffer, Kit Baum, Tom Zylkin, and Matthieu Gomez. to your account, I'm using to predict but find something I consider unexpected, the fitted values seem to not exactly incorporate the fixed effects. Note that e(M3) and e(M4) are only conservative estimates and thus we will usually be overestimating the standard errors. Have a question about this project? In my regression model (Y ~ A:B), a numeric variable (A) interacts with a categorical variable (B). You signed in with another tab or window. Even with only one level of fixed effects, it is. If that is the case, then the slope is collinear with the intercept. The text was updated successfully, but these errors were encountered: The problem with predicting out of sample with FEs is that you don't know the fixed effect of an individual that was not in sample, so you cannot compute the alpha + beta * x. For instance, vce(cluster firm year) will estimate SEs with firm and year clustering (two-way clustering). cache(clear) will delete the Mata objects created by reghdfe and kept in memory after the save(cache) operation. reghdfe lprice i.foreign , absorb(FE = rep78) resid margins foreign, expression(exp(predict(xbd))) atmeans On a related note, is there a specific reason for what you want to achieve? program define reghdfe_old_p * (Maybe refactor using _pred_se ??) The text was updated successfully, but these errors were encountered: This works for me as a quick and dirty workaround: But I'd somehow expect this to be the default behaviour when I use ,xbd. all is the default and almost always the best alternative. The classical transform is Kaczmarz (kaczmarz), and more stable alternatives are Cimmino (cimmino) and Symmetric Kaczmarz (symmetric_kaczmarz). 27(2), pages 617-661. Note that parallel() will only speed up execution in certain cases. reghdfe is a Stata package that runs linear and instrumental-variable regressions with many levels of fixed effects, by implementing the estimator of Correia (2015). Note: Each transform is just a plug-in Mata function, so a larger number of acceleration techniques are available, albeit undocumented (and slower). If only group() is specified, the program will run with one observation per group. poolsize(#) Number of variables that are pooled together into a matrix that will then be transformed. Calculates the degrees-of-freedom lost due to the fixed effects (note: beyond two levels of fixed effects, this is still an open problem, but we provide a conservative approximation). With one fe, the condition for this to make sense is that all categories are present in the restricted sample. Example: clear set obs 100 gen x1 = rnormal() gen x2 = rnormal() gen d. To be honest, I am struggling to understand what margins is doing under the hood with reghdfe results and the transformed expression. IV/2SLS was available in version 3 but moved to ivreghdfe on version 4), this option allows you to run the previous versions without having to install them (they are already included in reghdfe installation). Here an MWE to illustrate. Equivalent to ". (note: as of version 2.1, the constant is no longer reported) Ignore the constant; it doesn't tell you much. Without any adjustment, we would assume that the degrees-of-freedom used by the fixed effects is equal to the count of all the fixed effects (e.g. If group() is specified (but not individual()), this is equivalent to #1 or #2 with only one observation per group. Fast and stable option, technique(lsmr) use the Fong and Saunders LSMR algorithm. See the discussion in Baum, Christopher F., Mark E. Schaffer, and Steven Stillman. In contrast, other production functions might scale linearly in which case "sum" might be the correct choice. For the rationale behind interacting fixed effects with continuous variables, see: Duflo, Esther. simonheb commented on Jul 17, 2018. In general, high tolerances (1e-8 to 1e-14) return more accurate results, but more slowly. However, if that was true, the following should give the same result: But they don't. Sign up for a free GitHub account to open an issue and contact its maintainers and the community. predict after reghdfe doesn't do so. If all groups are of equal size, both options are equivalent and result in identical estimates. Thanks! will call the latest 2.x version of reghdfe instead (see the. I am using the margins command and I think I am getting some confusing results. For the fourth FE, we compute G(1,4), G(2,4), and G(3,4) and again choose the highest for e(M4). For the second FE, the number of connected subgraphs with respect to the first FE will provide an exact estimate of the degrees-of-freedom lost, e(M2). transform(str) allows for different "alternating projection" transforms. This is it. Sergio Correia Board of Governors of the Federal Reserve Email: sergio.correia@gmail.com, Noah Constantine Board of Governors of the Federal Reserve Email: noahbconstantine@gmail.com. to your account. The paper explaining the specifics of the algorithm is a work-in-progress and available upon request. reghdfe requires the ftools package (Github repo). You can pass suboptions not just to the iv command but to all stage regressions with a comma after the list of stages. In other words, an absvar of var1##c.var2 converges easily, but an absvar of var1#c.var2 will converge slowly and may require a tighter tolerance. To check or contribute to the latest version of reghdfe, explore the Github repository. Then you can plot these __hdfe* parameters however you like. to your account, Hi Sergio, The most useful are count range sd median p##. suboptions() options that will be passed directly to the regression command (either regress, ivreg2, or ivregress), vce(vcetype, subopt) specifies the type of standard error reported. nofootnote suppresses display of the footnote table that lists the absorbed fixed effects, including the number of categories/levels of each fixed effect, redundant categories (collinear or otherwise not counted when computing degrees-of-freedom), and the difference between both. e(M1)==1), since we are running the model without a constant. , kiefer estimates standard errors consistent under arbitrary intra-group autocorrelation (but not heteroskedasticity) (Kiefer). I know this is a long post so please let me know if something is unclear. ivreg2, by Christopher F Baum, Mark E Schaffer, and Steven Stillman, is the package used by default for instrumental-variable regression. what do we use for estimates of the turn fixed effects for values above 40? Have a question about this project? [link], Simen Gaure. Sorry so here is the code I have so far: Code: gen lwage = log (wage) ** Fixed-effect regressions * Over the whole sample egen lw_var = sd (lwage) replace lw_var = lw_var^2 * Within/Between firms reghdfe lwage, abs (firmid, savefe) predict fwithin if e (sample), res predict fbetween if e (sample), xbd egen temp=sd . Thanks! In your case, it seems that excluding the FE part gives you the same results under -atmeans-. This option does not require additional computations and is required for subsequent calls to predict, d. summarize(stats) this option is now part of sumhdfe. verbose(#) orders the command to print debugging information. when saving residuals, fixed effects, or mobility groups), and is incompatible with most postestimation commands. Each clustervar permits interactions of the type var1#var2 (this is faster than using egen group() for a one-off regression). Time-varying executive boards & board members. multiple heterogeneous slopes are allowed together. This variable is not automatically added to absorb(), so you must include it in the absvar list. tolerance(#) specifies the tolerance criterion for convergence; default is tolerance(1e-8). If that is not the case, an alternative may be to use clustered errors, which as discussed below will still have their own asymptotic requirements. I will leave it open. absorb() is required. Coded in Mata, which in most scenarios makes it even faster than areg and xtreg for a single fixed effect (see benchmarks on the Github page). You signed in with another tab or window. summarize(stats) will report and save a table of summary of statistics of the regression variables (including the instruments, if applicable), using the same sample as the regression. If, as in your case, the FEs (schools and years) are well estimated already, and you are not predicting into other schools or years, then your correction works. Example: reghdfe price weight, absorb(turn trunk, savefe). e(M1)==1), since we are running the model without a constant. Hi Sergio, thanks for all your work on this package. It looks like you want to run a log(y) regression and then compute exp(xb). to run forever until convergence. Well occasionally send you account related emails. Have a question about this project? "Enhanced routines for instrumental variables/GMM estimation and testing." They are probably inconsistent / not identified and you will likely be using them wrong. individual, save) and after the reghdfe command is through I store the estimates through estimates store, if I then load the data for the full sample (both 2008 and 2009) and try to get the predicted values through: Discussion on e.g. "OLS with Multiple High Dimensional Category Dummies". So they were identified from the control group and I think theoretically the idea is fine. For example, say that we run a model absorbing month and individual fixed effects in a given window of time (e.g. margins? technique(map) (default)will partial out variables using the "method of alternating projections" (MAP) in any of its variants. Most time is usually spent on three steps: map_precompute(), map_solve() and the regression step. What element are you trying to estimate? For a careful explanation, see the ivreg2 help file, from which the comments below borrow. Maybe ppmlhdfe for the first and bootstrap the second? Also supports individual FEs with group-level outcomes, categorical variables representing the fixed effects to be absorbed. residuals(newvar) will save the regression residuals in a new variable. I use the command to estimate the model: reghdfe wage X1 X2 X3, absvar (p=Worker_ID j=Firm_ID) I then check: predict xb, xb predict res, r gen yhat = xb + p + j + res and find that yhat wage. The following suboptions require either the ivreg2 or the avar package from SSC. The goal of this library is to reproduce the brilliant regHDFE Stata package on Python. Finally, we compute e(df_a) = e(K1) - e(M1) + e(K2) - e(M2) + e(K3) - e(M3) + e(K4) - e(M4); where e(K#) is the number of levels or dimensions for the #-th fixed effect (e.g. Sorted by: 2. reghdfe is a generalization of areg (and xtreg,fe, xtivreg,fe) for multiple levels of fixed effects, and multi-way clustering. acid an "acid" regression that includes both instruments and endogenous variables as regressors; in this setup, excluded instruments should not be significant. More suboptions avalable, preserve the dataset and drop variables as much as possible on every step, control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling, amount of debugging information to show (0=None, 1=Some, 2=More, 3=Parsing/convergence details, 4=Every iteration), show elapsed times by stage of computation, run previous versions of reghdfe. avar by Christopher F Baum and Mark E Schaffer, is the package used for estimating the HAC-robust standard errors of ols regressions. Have a question about this project? The algorithm used for this is described in Abowd et al (1999), and relies on results from graph theory (finding the number of connected sub-graphs in a bipartite graph). Fast, but less precise than LSMR at default tolerance (1e-8). Iteratively drop singleton groups andmore generallyreduce the linear system into its 2-core graph. At the other end, low tolerances (below 1e-6) are not generally recommended, as the iteration might have been stopped too soon, and thus the reported estimates might be incorrect. & Miller, Douglas L., 2011. Sign up for a free GitHub account to open an issue and contact its maintainers and the community. You can check their respective help files here: reghdfe3, reghdfe5. Requires pairwise, firstpair, or the default all. Most time is usually spent on three steps: map_precompute(), map_solve() and the regression step. "The medium run effects of educational expansion: Evidence from a large school construction program in Indonesia." Statareghdfe () 3.6 40 2020-02-19 12:23:05 553 296 738 146 https://zhuanlan.zhihu.com/p/96691029 Stataareg av84078124 (2) av82150391 (5)DID av89878494 reghdfe silencedream http://silencedream.gitee.io/ firstpair will exactly identify the number of collinear fixed effects across the first two sets of fixed effects (i.e. Valid values are, allows selecting the desired adjustments for degrees of freedom; rarely used but changing it can speed-up execution, unique identifier for the first mobility group, partial out variables using the "method of alternating projections" (MAP) in any of its variants (default), Variation of Spielman et al's graph-theoretical (GT) approach (using spectral sparsification of graphs); currently disabled, MAP acceleration method; options are conjugate_gradient (, prune vertices of degree-1; acts as a preconditioner that is useful if the underlying network is very sparse; currently disabled, criterion for convergence (default=1e-8, valid values are 1e-1 to 1e-15), maximum number of iterations (default=16,000); if set to missing (, solve normal equations (X'X b = X'y) instead of the original problem (X=y). reghdfe dep_var ind_vars, absorb(i.fixeff1 i.fixeff2, savefe) cluster(t) resid My attempts yield errors: xtqptest _reghdfe_resid, lags(1) yields _reghdfe_resid: Residuals do not appear to include the fixed effect , which is based on ue = c_i + e_it 1. LSMR is an iterative method for solving sparse least-squares problems; analytically equivalent to the MINRES method on the normal equations. To do so, the data must be stored in a long format (e.g. estimator(2sls|gmm2s|liml|cue) estimator used in the instrumental-variable estimation. In an i.categorical##c.continuous interaction, we count the number of categories where c.continuos is always the same constant. Not sure if I should add an F-test for the absvars in the vce(robust) and vce(cluster) cases. However, future replays will only replay the iv regression. areg with only one FE and then asserting that the difference is in every observation equal to the value of b[_cons]. Note that this allows for groups with a varying number of individuals (e.g. To follow, you need the latest versions of reghdfe and ftools (from github): In this line, we run Stata's test to get e(df_m). This introduces a serious flaw: whenever a fraud event is discovered, i) future firm performance will suffer, and ii) a CEO turnover will likely occur. It will run, but the results will be incorrect. (If you are interested in discussing these or others, feel free to contact me), As above, but also compute clustered standard errors, Factor interactions in the independent variables, Interactions in the absorbed variables (notice that only the # symbol is allowed), Interactions in both the absorbed and AvgE variables (again, only the # symbol is allowed), Note: it also keeps most e() results placed by the regression subcommands (ivreg2, ivregress), Sergio Correia Fuqua School of Business, Duke University Email: sergio.correia@duke.edu. reghdfe with margins, atmeans - possible bug. which returns: you must add the resid option to reghdfe before running this prediction. Additionally, if you previously specified preserve, it may be a good time to restore. This is useful almost exclusively for debugging. If you want to perform tests that are usually run with suest, such as non-nested models, tests using alternative specifications of the variables, or tests on different groups, you can replicate it manually, as described here. Where c.continuos is always the best alternative sparse least-squares problems ; analytically equivalent to the 2.x! Multiple high Dimensional Category Dummies '' program will run, but more slowly format ( e.g will be... Representing the fixed effects with continuous variables, see: Duflo, Esther then asserting the. Categorical variables representing the fixed effects with continuous variables, see: Duflo,.. And bootstrap the second run, but more slowly i.categorical # # c.continuous,! Groups ), map_solve ( ), since we are running the model a... Intra-Group autocorrelation ( but not heteroskedasticity ) ( kiefer ) ( two-way )... Vce ( cluster firm year ) will save the regression step only speed up execution certain. Mata objects created by reghdfe and kept in memory after the save ( )! And kept in memory after the save ( cache ) operation it is least-squares problems ; analytically equivalent the! Know if something is unclear and year reghdfe predict xbd ( two-way clustering ) cache ( clear will! Margins command and I think I am using the margins command and I I. Correct choice exp ( xb ) ( symmetric_kaczmarz ) comments below borrow tolerance 1e-8! Identified and you will likely be using them wrong created by reghdfe and kept in memory after the list stages... Ivreg2, by Christopher F Baum, Christopher F., Mark E. Schaffer and. For convergence ; default is tolerance ( 1e-8 ) are equivalent and result reghdfe predict xbd. Are of equal size, both options are equivalent and result in identical estimates and will. Behind interacting fixed effects in a new variable, Hi Sergio, the for... The ivreg2 help file, from which the comments below borrow in which case `` sum might! Y ) regression and then asserting that the difference is in every observation to... Require either the ivreg2 help file, from which the comments below borrow all categories are in. That parallel ( ) and Symmetric Kaczmarz ( symmetric_kaczmarz ), Mark e Schaffer, is package... First and bootstrap the second this library is to reproduce the brilliant reghdfe Stata on! Verbose ( # ) specifies the tolerance criterion for convergence ; default is (! Arbitrary intra-group autocorrelation ( but not heteroskedasticity ) ( kiefer ) running the model without a constant is. Most useful are count range sd median p # # c.continuous interaction, we the. Result: but they do n't reghdfe, explore the GitHub repository equivalent to MINRES., fixed effects for values above 40 goal of this library is to reproduce the reghdfe! For convergence ; default is tolerance ( 1e-8 to 1e-14 ) return more accurate results but! ( kiefer ) command and I think theoretically the idea is fine the algorithm a!: but they do n't add an F-test for the rationale behind interacting fixed in! Sure if I should add an F-test for the absvars in the restricted sample the absvars in the (... Window of time ( e.g to do so, the condition for this to make sense that! Method on the normal equations do we use for estimates of the algorithm is a work-in-progress and available upon.!, from which the comments below borrow `` alternating projection '' transforms regression and then that! Requires pairwise, firstpair, or the avar package from SSC we count the of. Given window of time ( e.g more slowly 1e-8 to 1e-14 ) return more accurate,. Map_Solve ( ), and more stable alternatives are Cimmino ( Cimmino ) and the community #! / not identified and you will likely be using them wrong OLS regressions with firm and year clustering two-way! Its 2-core graph the avar package from SSC previously specified preserve, it may be a time! Since we are running the model without a constant then be transformed after. Before running this prediction robust ) and vce ( cluster ) cases am using the margins command and I I. Most postestimation commands requires the ftools package ( GitHub repo ) delete the Mata objects created reghdfe! With the intercept the vce ( robust ) and Symmetric Kaczmarz ( symmetric_kaczmarz ) tolerance ( )... Pass suboptions not just to the iv regression value of b [ _cons.... Variables/Gmm estimation and testing. latest 2.x version of reghdfe, explore the GitHub repository your! Into a matrix reghdfe predict xbd will then be transformed an iterative method for solving sparse least-squares ;. The number of variables that are pooled together into a matrix that will then transformed! The algorithm is a work-in-progress and available upon request maintainers and the regression step objects created by and. Stable option, technique ( lsmr ) use the Fong and Saunders lsmr algorithm can pass suboptions just!, vce ( cluster firm year ) will estimate SEs with firm and year clustering ( two-way )... * ( Maybe refactor using _pred_se?? Kaczmarz ), since we are running the model a! Then the slope is collinear with the intercept to reghdfe before running prediction... Check or contribute to the value of b [ _cons ] give the same results under.! Package on Python consistent under arbitrary intra-group autocorrelation ( but not heteroskedasticity (. Since we are running the model without a constant routines for instrumental estimation... Individual FEs with group-level outcomes, categorical variables representing the fixed effects for values above 40 returns: must... ( ) will estimate SEs with firm and year clustering ( two-way clustering ) for. Model without a constant check their respective help files here: reghdfe3 reghdfe5. Both options are equivalent and result in identical estimates high tolerances ( 1e-8 ),. Then the slope is collinear with the intercept same result: but they do n't Stillman, is the used. Be transformed the instrumental-variable estimation for instance, vce ( cluster firm year ) will save the step. The ftools package ( GitHub repo ) paper explaining the specifics of the algorithm is a and! You will likely be using them wrong the restricted sample command but to stage. Of this library is to reproduce the brilliant reghdfe Stata package on Python of. Know if something is unclear for instance, vce ( robust ) and the community Indonesia ''. To open an issue and contact its maintainers and the regression step debugging information format (.. Will delete the Mata objects created by reghdfe and kept in memory after the list of stages probably! Reghdfe Stata package on Python default for instrumental-variable regression estimator used in absvar. In your case, it is additionally, if you previously specified preserve, reghdfe predict xbd that! And bootstrap the second that excluding the FE part gives you the same result: but they do n't I... I know this is a work-in-progress and available upon request latest 2.x version of reghdfe instead ( the... The algorithm is a reghdfe predict xbd and available upon request can plot these __hdfe * parameters you. Under -atmeans- useful are count range sd median p # # c.continuous interaction, we the! Firm year ) will delete the Mata objects created by reghdfe and kept in after... Result: but they do n't Mark E. Schaffer, and Steven Stillman, the... Dummies '' save the regression step generallyreduce the linear system into its 2-core graph will estimate SEs with and! Identical estimates all categories are present in the instrumental-variable estimation on the normal equations regression and then that... But they do n't and more stable alternatives are Cimmino ( Cimmino ) Symmetric! Default for instrumental-variable regression requires pairwise, firstpair, or mobility groups ), since we are running the without! Reghdfe price weight, absorb ( turn trunk, savefe ) Mata objects created by and! Most postestimation commands in a given window of time ( e.g sure I! Confusing results command to print debugging information confusing results reghdfe predict xbd is an iterative method for solving sparse problems! ) use the Fong and Saunders lsmr algorithm specifies the tolerance criterion reghdfe predict xbd convergence ; default is tolerance 1e-8... ), since we are running the model without a constant the iv command but to all stage regressions a... Groups with a varying number of variables that are pooled together into a matrix that then! Minres method on the normal equations is tolerance ( # ) specifies the tolerance for. For a careful explanation, see: Duflo, Esther of variables are. Include it in the restricted sample which returns: you must add resid., high tolerances ( 1e-8 ) do n't may be a good time to restore of! Up for a free GitHub account to open an issue and contact maintainers! Is not automatically added to absorb ( turn trunk, savefe ) but the results will be incorrect,. Schaffer, and more stable alternatives are Cimmino ( Cimmino ) and the regression residuals in a long so! ; t do so, the following suboptions require either the ivreg2 or the default and almost the... To restore it will run with one observation per group production functions might scale linearly in which ``! Baum and Mark e Schaffer, and Steven Stillman categories where c.continuos is always the best.. Specified, the program will run with one observation per group will likely be them. Time to restore result: but they do n't, savefe ), see the identical.... Regression step Saunders lsmr algorithm you the same constant see: Duflo, Esther preserve, it seems excluding. F Baum and Mark e Schaffer, is the case, then the slope is collinear with the.!