Evaluating the impact of simultaneous multithreading on network servers using real hardware
外文翻译原文Pollution haven hypothesis and Environmental impacts of foreign direct Investment: The Case of Industrial Emission of Sulfur Dioxide (SO2) in Chinese provincesMaterial Source: CERDI,Etudes et Documents,Ec2005,06 Author:Jie HEAbstractRecognizing the complex inter-correlation between FDI emission and the three economic determinants of emission, we constructed a simultaneous model to study the FDI-emission nexus in China by exploring both the dynamic recursive FDI entry decision and the linkage from FDI entry to final emission results under the intermediation of the scale, composition and technique effects. The model is then estimated on the panel data of China’s 29 provinces’ industrial SO2 emission. Result shows that, exerting through different channels; the total impact of FDI on industrial SO2 emission is very small. With 1% increase in FDI capital stock, industrial SO2 emission will increase by 0.099%,in which the emission increase caused by impact of FDI’s role in reinforcement of environment al regulation. By introducing to the simultaneous system the recursive dynamism that supposes FDI entry decision to depend on last period’s economic growth and environmental regulation stringency, our model also provides convincing supportive evidences for ‘Pollution haven’ hypothesis. Although FDI enterprises in China generally produce with higher pollution efficiency ,the rise in environmental regulation stringency still has modest deterrent effect on FDI capital inflow. Furthermore, the composition transformation impact of FDI in China seems to be dominated by the inflow of foreign capital pursing a ‘production platfom’ that provides lower pollution regulation compliance cost.Kye words: foreign direct investment, industrial SO2 emission, simultaneous, scale effect, composition effect, income effect, and pollution haven hypothesis Pollution haven hypothesis and Environmental impacts of foreign direct investment: The Case of Industrial Emission of Sulfur Dioxide (SO2) in ChineseprovincesIntroductionThe market-oriented economic reform has gradually turned China into one of the most attractive destinations for foreign direct investment (FDI) around the world. During most part of 1990s, China was the world second largest FDI recipient just behind the United States and the largest FDI recipient in the developing world. After entering into the new millennium, contrary to the decreasing tendency of FDI inflows in many OECD economics due to their sluggish macroeconomic performance, China experienced a steady FDI inflow increase. According to OECD(2004), China became world biggest FDI recipient in 2003 with an annual inflow of FDI amount to about 53.3 billions US dollars, largely higher than to Germany(47 billions USD) and to the UnitedState(40 billions USD) at the same year. During 25 years of economic reform,China has received in total almost 500billions USD of foreign direct investment.(SSB,2004) From the evolution of annual FDI actually utilized in China since 1978 illustrated in Figure 1,we observe a generally increasing trajectory with the most important rise happened after 1993. the last several years. OECD(2004)indicated that at the beginning years of China’s economic reform, FDI choosing China as destination aimed at integrating its cheap labor resource into their global production chain, but recently, there is an increasing tendency for the foreign companies to invest in China as part of their strategies to service the local clients or to acquire a strategic position in China’s enormous market.(Insert Figure 1 about here)However China’s remarkable openness process during the last 25 years seemed to be accompanied by obvious environmental pollution problems. Air pollution situation in the urban area started deteriorating quickly since the first decade of ec onomic reform in 1980’s. Although some improvement came up during 1990’s owing to the reinforcement of pollution control policies,2/3 of Chinese cities still fail to meet the air quality standard established by China’s Environment Protection Agency (EPA), which signifies that more than 3/4 of the urban population are exposed in seriously polluted air. What is the possible relationship between the rapid FDI inflow and the air pollution deterioration? Should the in pollution deterioration? Should the inflow o f FDI be responsible for China’s air pollution situation?Aiming at obtaining a better understanding on the FDI-environment nexus, this paper constructs a five-equation simultaneous system to include both the FDIlocation decision with respect to host coun try’s environmental regulation stringency and the impact of FDI on pollution through various underlying simultaneous mechanisms. This simultaneous system is then tested by the penal data of industrial sulfur dioxide (SO2) emission of the 29 Chinese province during the period 1994-2001, during which FDI inflow experienced the most important increase. The time-constant specific effect for each province is captured by fixed effect parameters .To correct potential first-order serial correlation and heteroskedasticity in each estimation function, an instrumentation method inspired by both the GMM-system estimator of Blundell and Bond (1998) and Sevestre and Trognon(1996) for dyanamic panel data is used on equation-level. Finally, to employ the full information imparted from the simultaneous system and to avoid inconsistency in estimation caused by the inter-equation residual correlation, we used Generalized method of Moment(GMM) estimator for simultaneous system to estimate the whole system.The organization of the paper is the following. In the second section, we make a brief literature review to explain the necessity to investigate the relationship between FDI and environment through a structural simultaneous system by revealing the complexity in the FDI-pollution nexus. Section 3 gives a simple introduction the simultaneous model in the fourth section. The econometric results are presented and discussed in Section 5. Finally, we conclude in Section 6.2. FDI-pollution nexus literature review Most of the existing literaturesdid not directly treat the FDI-pollution nexus but basedtheir analyses on the causality from environmental regulation stringency to firm's competitive mess as entry point. They supposed under globalization circumstance, the relatively lax environmental regulation in the den-eloping countries becomes an attractive comparative advantage to the pollution-intensive foreign capital seeking for a `pollution-hay-en' to avoid paying costly pollution control compliance expenditure domestically.' Though this `pollution ha}Ten' hypothesis sounds reasonable, almost no empirical analysis has yet provided convincing supportive evidences rep-Baling FDI's searching activity for the `production platforms' permitting lower pollution abatement cost.' Besides the potential explanation residing in measurement problems for both environmental regulation stringency and FDI flows, most of authors attributed the incapability in detecting a significant regulation-FDI flows nexus to the complexity of the relationship between them. Firstly, compared to the classical determinant factors in FDI location decision,as the cony-entional production factor cost, tax rate differential, the host country's market size, exchange rate risk, trade impediments and market power, etc., environmental regulation compliance cost is not a critical cost factor for most of pri}Tate firms. Dasgupta, Wang and Wheeler (1997) find the control cost for sulfur dioxide pollution in large-scale Chinese industrial enterprises is just a few dollars per ton until the control rate rise abo}Te to 70%. Various studies based on developed country's firm-lei-el data also found the total factor productivity decline caused by reinforced environmental regulations generally stays modest. enison, 1979, Gray, 1987, Hay-eman and Christiansen, 1981, etc) This suggests that pollution control cost differential does not provide OECD firms with strong incentive to mop-a offshore. affe et al., 1995) Secondly, different from `pollution hay-en' hypothesis 一one classical economic reasoning based on the analogy of traditional static comparative advantage perspective, hypothesis of Porter asserts that from a dynamic point of,environmental regulation stringency can encourage efficiency innovanon and guide production procedure to be more environment-friendly (Porter and Linde, 1995; epapadeas and Zeeuw, 1999). This dynamic technical progress can further induce a `negative cost', which will benefit productivity reinforcement owing to cleaned environment.Jaffe, 1995) Following this point of view, firm's `technology profiting' actinides catalyzed by reinforcement of environmental control policy will be able to cancel off the differential in pollution abatement cost between countries, capital flight due to this differential is actually unnecessary in a long-run. Thirdly, the insignificance in using environmental regulation to explain capital flow might also due to the potentially rep-ersed causality between these two phenomena. On one hand, for a developed economy,the `racing-to-the-bottom' hypothesis emphasizes the possibility that the profit-drip-en capital outflow pursuing the lowest production cost might create pressures on the government to lower their environmental standard (Re}Tesz, 199?). On the other hand, several `pollution hay-en' studies based on the historical experiences of den-eloping countries showed that as income increases with FDI inflow, the environmental regulation, strongly correlated with income lei-el, will also increase with FDI inflow, therefore the "pollution-hay-en" should only be a transient phenomenon. (}lani and }}-'heeler, 1997) Gig-en these two aspects, the cost gap in emission abatement between den-eloped and den-eloping countries should hay-a the tendency to decrease with the inter-country mop-ements of FDI. Finally, most of the "pollution hay-en" studies used the total pollution abatementcost as an approximation for the environmental regulation stringency. Howe}-er, to some extent, this indicator can also be regarded as a measurement for the total technical efforts of the host economy on pollution abatement, in which we should not ignore the contribution from the technically more efficient FDI firms. Going a step further, e}-en can prop-a the causality from en}-ironmental regulation stringency to dirty FDI inflow den-eloping countries; this does not immediately mean pollution will increase in host country. }1s found in some studies (Esheland and Harrison, 1997; etc.), the FDI enterprises pollution-intensive industries generally employ production and abatement specialized in technologies more environment-friendly than their domestic competitors in host den-eloping countries. This might be due to the fact that hea}-y emission may signals to the ins-estors that the FDI firms' production techniques are inefficient and hence reduces their expectation on the liability of these multinational corporations asgupta, Laplante and }lamingi, 1997); or simply because in}Testing in the de}Teloping countries is the global-scale production arrangement strategy of the multination enterprises, the adaptation of production technology to the local en}-ironmental standard is not necessary. If these FDI corporations replace the relatively less efficient domestic firms in the same production, we can expect a decline tendency in total pollution of the den-eloping host country. }loreo}-er, the presence of FDI enterprise may also reinforce competition and urge domestic firms to enhance research and development actin-ity and to increase their production efficiency, which will in the long run, strengthen the technical efficiency of the whole host economy.The FDI-pollution nexus is e}-en more complicated if we relate our theoretical consideration to the often-mentioned three economic characteristics. They are economic growth (scale effect), industrial composition (composition effect) and en}-ironmental regulation stringency (technique effect), defined in Grossman (1995) as the three economic determinants of emission from production actin-ides. On the first }-iew, FDI entry is a decision partially depending on the en}-ironmental regulation stringency (technique effect) and the economic scale (scale effectl of the host country. }1t the same time- the structural linha}e between FDI entry and final emission results is also built on their intermediations. Once foreign capital enters the host country, it can in turn exert influences on all the three characteristics of the economy. For the case of China, firstly, FDI entry can accelerate economic growth, either through productivityreinforcement伍i et al, 2001; Chen and Demurgey, 2002 and Liu and Wang, 2003), or through technology diffusion汀hompson, 2002; Cheung and Lin, 2004 and Lemoine and Unal-Kesenci, 2004), or through scale economy de}Telopment (Tuan and Ng, 2004). Secondly, although the theories that predict the pattern of trade does not focus on ownership, gig-en the similarity between the FDI location decision and trade specialization, most of the factors used in traditional theories to predict one country's trade patterns can be used to explain the composition impact of FDI. On one hand, `pollution-hay-en' hypothesis suggests China's relati}-ely lax environmental regulation attracts the inflow of polluting foreign capital, which will in turn increases proportion of polluting sectors in industrial composition. On the other hand, gig-en China's endowment in cheap labor force, traditional comparative advantage theory expects that some polluting labor-intensi}-a industries may also experience expansion with the inflow of FDI. Copeland and Taylor (1994,1997) and interweiler et al. (2001) combined these two aspects together and predicted the final composition transformation incurred by international trade depends on the force-contrast between these two comparative adsvantages in the host economy. The same conclusion is also }-alid for the case of FDI. Thirdly, FDI entry can also facilitate environmental regulation reinforcement, either by its direct contribution in pollution abatement capacity accumulation in host economy or indirectly by its income-growth impact that in turn reinforces public exigency for better environment. Finally, FDI-led }-ariations in all of the three emission determinants can further lead final emission result to }-ary and to affect the future FDIentry decision. Given these several aspects' consideration, Letchumanan and Kodama (2000) indicated the relationship between FDI and en}-ironment cannot be adequately understood by simply analyzing measurement of FDI flow in relation to environmental condition. }}'e also need to consider the simultaneously occurring trends and underlying mechanisms that going through the changes in economic scale, industrial composition and technique effect.3. Industrial SOZ emission and foreign direct investment situation in Chinese provincesThe regional disparity in the aspects of openness degree, economic growth and environmental situation between Chinese prop-inces became more and more remarkable during bthe last 25 years economic reform. Figure 2 shows the detailed regional distribution of industrial SO2 emission, accumulated FDI capital stock, economic growth and environmental regulation situation in year 2001.5 Clearly, the rapid economic growth catalyzed byintensified FDI inflowdoes not benefit the 30 prop-inces in the homogenous way. The high ratio of FDI capital stock toGDP is remarkably concentrated in the richer eastern coastal prop-ince.While both the FDI capital stock and per capita GDP shows obi-ious decreasing tendency when we mop-a fromeastern coastal to western inland prop-inces, SOS emission does not follow the same geographicaldistribution pattern. The serious SOS emission problem seems to appear more frequently in thecentral northern prop-inces that had long tradition in hea+ industrial production and somesouthern prop-ince as Guizhou, where the coal endowment contains high concentration of sulfur.Another reason to explain the serious SOS pollution problem is the lax en}Tironmental regulationapplied in some prop-inces, such as Heilongjiang, Shangdong, Fujiang andQinghai, where weobserva the co-existence of low a}-erage SOS 1e+ rate and high per capita SOS emission.Figure 3 further studies the correlation between economic growth, FDI stock, environmental regulation stringency and industrial SOS emission situation by plotting them bypair in same diagram. Except for the kind of ins-erted-U quadratic correlation between economicgrowth and FDI stock, concerning to the other three pair of correlation, we can not deri}-a theirclear correlation directions gig-en the low significance in the estimation coefficients. Obi-iously,The relationship between FDI and emission is more complicated than a simple positi}-a ornegati}-a correlation. nsert Figure 3 about here)4. The links between FDI and emission: The system of simultaneous equationsConsidering the shortcomings of the existing empirical studies on the FDI-en}-ironmentlinkage mentioned abo}-e, the basic idea of this paper is to study the relationship between FDIand final industrial SOS emission in China by exploring both the relationship between environmental regulation stringency and FDI entry decision and the linkage from FDI entry to the final emission result by a structural framework.1 direct inspiration of the system constructed in this paper comes from Dean(1998). Inher paper she studied the relationship between international trade and industrial wastewateremission in China by a simpler simultaneous system. Her model supposes that international tradeincreases pollution through "pollution hay-en" effect, but trade also contributes to economygrowth, which in turn reduces emission since higher income strenghens public exigency for abetter environment. Following the same reasoning, we suppose the relationship between FDI and industrial SO2 emission can be described by the following 5-equation simultaneous model. Equation (1) describes the Economic determinants of emissionFollowing Grossman(1995), we include scale effectcomposition effect .fZ and technique effect 2 in to thisequation. Other things kept unchanged, an economy with larger production scale emits morepollution, so we expect a positive coefficient for this term, which means>.Compositioneffect .fZ reflects pollution performance of an economy's industrial composition. Given thesame production scale, the industrial composition contains higher percentage of polluting sectorsemits more pollution. Therefore, we anticipate a positive coefficient for composition effect。
Simultaneous Equations Models (1)The Nature of Simultaneous Equations Models● In the previous study, we considered single equation models – one Y and one or more X ’s.● The cause-and-effect relationship is from X ’s to Y .● There may be a two-way, or simultaneous, relationship between Y and X ’s.● It is difficult to distinguish dependent and explanatory variables.● Set up simultaneous equations model where variables are jointly dependent or endogenous.● Can not estimate the parameters of a single equation without taking into account of theinformation provided by other equations.● OLS estimator for single equation in simultaneous model is biased and inconsistent.● i i i i ii i i u X Y Y u X Y Y 21211222021111212101+++=+++=γββγββY 1i and u 2i are correlated, Y 2i and u 1i are correlated, so OLS leads to inconsistent estimates.Examples of Simultaneous Equations Models● Example1: Demand –and-supply modelDemand function: 0 1110<++=αααt t d t u P Q Supply function:0 1210>++=βββt t s t u P QEquilibrium condition: s t d t Q Q = Wheretime t supplied, Q demanded,quantity s t ===quantityQ d t● Price P and quantity Q are determined by the intersection of the demand and supply curves.Demand and supply curves are linear.● P and Q are jointly dependent.● the demand curve will shift upward if u 1t is positive and downward if u 1t is negative. ● A shift in the demand curve changes both P and Q● A change in u 2t will shift supply curve then change both P and Q● So u 1t and P, u 2t and P are correlated – violate the important assumption of CLRM. ● Example 2: Keynesian model of income determination● Consumption function: 10 1 10<<++=βββt t t u Y CIncome identity:)(t t t t S I C Y =+=Where C = consumption expenditure, Y = income, I = investment (assumed exogenous), S = savings, t = time, u = stochastic disturbance term● β1 is marginal propensity to consume(MPC) lying between 0 and 1.● C and Y are interdependent and Y is not expected to be independent of the disturbance term. ● Because U i shifts, then the consumption function also shifts, which, in turn, affects Y.The simultaneous equation bias: Inconsistency of OLS estimator● Use simple Keynesian model of income determination to show OLS estimator is inconsistentin simultaneous model● We want to estimate consumption function10t t t u Y C ++=ββ● First show that Y t and u t are correlated.Substituting consumption function into income identity:t t t u I Y 111011111ββββ-+-+-=t t I Y E 11011)(ββββ-+-=so12t 1]E[u)]()][([),cov(β-=--=t t t t t t u E u Y E Y E u Y● Second show that the OLS estimator 1ˆβis an inconsistent estimator of 1β, because of the correlation of Y t and u t∑∑∑∑∑∑∑∑+=++==---=21210221)( )())((ˆtttt ttt t t t t t tyy u y y u Y yy C Y Y Y Y C C ββββSo)(11 )/lim()/lim()ˆlim(2211211Yt t t N y p N u y p p σσββββ-+=+=∑∑● Plim(1ˆβ) will be always be greater than 1βThe Identification Problem● Recall the demand and supply model, if we have time-series data on P and Q only and noadditional information, can we estimate the demand function?● Need to solve the identification problem.Notations and Definitions● Take income determination model as example:Consumption function : 10 1 10<<++=βββt t t u Y CIncome identity : )(t t t t S I C Y =+=● - Endogenous variables: determined within the model- Predetermined variables: determined outside the model.- Predetermined variables include current and lagged exogenous variables and lagged endogenous variables.- Lagged endogenous variable is nonstochastic, hence a predetermined variable. - Be careful to defend the classification.●β’s are known as the structural parameters or coefficients.● Solve for endogenous variables to derive the reduced-form equations.● Reduced-form equation is the one which expresses an endogenous variable solely in terms ofthe predetermined variables and the stochastic disturbances.● Substitute consumption function into income identity:t t t w I Y +∏+∏=10Where 1t 111001 w ,11,1ββββ-=-=∏-=∏t uSubstitute income identity into consumption functiont t t w I C +∏+∏=32Where 1t 1131021 w ,1 ,1βββββ-=-=∏-=∏t u● 31 and ∏∏ are impact multipliers● Reduced form equations give the equilibrium values of the relevant endogenous variables.● The OLS method can be applied to estimate the coefficients of the reduced –form equations● Structural coefficients can be “retrieved ” from the reduced form coefficients.The identification problem● The identification problem is whether numerical estimates of the parameters of a structuralequation can be obtained from the estimated reduced form coefficients.● Identified, underidentified, exactly identified and overidentified.Underidentified● Consider the demand and supply model, together with market clearing condition.(Insert equations)● There are four structural coefficients corresponding two reduced form coefficients. – modelcan not be solved.● What does “underidentified ” mean? See figures● An alternative way to looking at the identification problem. – “mongrel ” equations. Ifmongrel equation is observational indistinguishable with demand function, then demand function is underidentified.Just , or exact, identification● Demand function: 0 ,0211210><+++=αααααt t t t u I P Q Supply function: 0 1210>++=βββt t t u P Q● There is an additional variable in the demand equation● Derive reduced form equations● Five structural coefficients corresponding with four reduced form coefficients – remainunderidentified.● Demand curve is underidentified, but supply curve is identified.● “mongrel ” equation is distinguishable from supply function but not from demand function.● The presence of an additional variable in the demand function enables us to identify thesupply function.● ConsiderDemand function : 0,0 211210><+++=αααααt t t t u I P QSupply function:0 ,0 21 21210>>+++=-βββββt t t t u P P QExactly identified!Overidentified● Demand function : 13210t t t t t u R I P Q ++++=ααααSupply function: 21210t t t t u P P Q +++=-βββ● Solvefor the structural equations and get reduced form:t t t t t v P R I P +∏++∏+∏+∏=-13210t t t t t w P R I Q +∏+∏+∏+∏=-17654● 7 coefficients corresponding 8 equations. Will have multiple solutions. For example:151201 ∏∏=∏∏=ββ● The reason for the multiple solution is that we have “too much ” information to identify thesupply curve.● “too much ” reflects by the exclusion of two variables in the supply function. One should beenough.Rules for identification● Solve structural equations, then get reduced form, check how many structural coefficientsand how many reduced form coefficients – no need for this time-consuming process● Order conditions of identificationM – number of endogenous variables in the modelm – number of endogenous variables in a given equation K – number of predetermined variables in the modelk – number of predetermined variables in a given equationA equivalent explanation is, in a model of M simultaneous equations, in order for an equationto be identified, the number of predetermined variables excluded from the equation must not be less than the number of endogenous variables included in that equation less 1. that is: K-k>= m-1● Check the previous examples.。
Copyright © 2011 by Beiting Cheng, Ioannis Ioannou, and George SerafeimWorking papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of workingCorporate Social Responsibilityand Access to FinanceBeiting Cheng Ioannis Ioannou George SerafeimWorking Paper11-130CORPORATE SOCIAL RESPONSIBILITY AND ACCESS TO FINANCEBeiting ChengHarvard Business SchoolIoannis IoannouLondon Business SchoolGeorge SerafeimHarvard Business SchoolMay 18th, 2011AbstractIn this paper, we investigate whether superior performance on corporate social responsibility (CSR) strategies leads to better access to finance. We hypothesize that better access to finance can be attributed to reduced agency costs, due to enhanced stakeholder engagement through CSR and reduced informational asymmetries, due to increased transparency through non-financial reporting. Using a large cross-section of firms, we show that firms with better CSR performance face significantly lower capital constraints. The results are confirmed using an instrumental variables and a simultaneous equations approach. Finally, we find that the relation is primarily driven by social and environmental performance, rather than corporate governance. Keywords: corporate social responsibility, sustainability, capital constraints, ESG (environmental, social, governance) performanceI.INTRODUCTIONIn recent decades, a growing number of academics as well as top executives have been allocating a considerable amount of time and resources to Corporate Social Responsibility1 (CSR) strategies. According to the latest UN Global Compact – Accenture CEO study2 (2010), 93 percent of the 766 participant CEOs from all over the world declared CSR as an “important” or “very important” factor for their organizations’ future success. On the demand side, consumers are becoming increasingly aware of firms’ CSR performance: a recent 5,000-people survey3 by Edelman4revealed that nearly two thirds of those interviewed cited “transparent and honest business practices” as the most important driver of a firm’s reputation. Although CSR has received such a great amount of attention, a fundamental question still remaining unanswered is whether CSR leads to value creation, and if so, in what ways? The extant research so far has failed to give a definitive answer (Margolis and Walsh, 2003). In this paper, we examine one specific mechanism through which CSR may lead to better long-run performance: by lowering the constraints that the firm is facing when accessing funds to finance operations and undertake strategic projects.To date, many studies have investigated the link between CSR and financial performance, and have found rather conflicting results5. According to McWilliams and Siegel (2000), conflicting results were due to the studies’ “several important theoretical and empirical 1 Here, we follow a long list of studies (e.g. Carroll, 1979; Wolfe and Aupperle, 1991; Waddock and Graves, 1997; Hillman and Keim, 2001; Waldman et al., 2006) in defining corporate social responsibility as: “a business organization’s configuration of principles of social responsibility, processes of social responsiveness, and policies, programs, and observable outcomes as they relate to the firm’s social relationships” (Wood, 1991: p.693).2 “A New Era of Sustainability. UN Global Compact-Accenture CEO Study 2010” last accessed July 28th, 2010 at: (https:///sustainability/research_and_insights/Pages/A-New-Era-of-Sustainability.aspx)3Mckinght, L., 2011. “Companies that do good also do well”, Market Watch, The Wall Street Journal (Digital Network), last accessed April 11th, 2011 at: /story/companies-that-do-good-also-do-well-2011-03-234 Edelman is a leading independent global PR firm that has been providing public relations counsel and strategic communications services for more than 50 years. /5 Margolis and Walsh (2003) and Orlitzky, Schidt and Rynes (2003) provide comprehensive reviews of the extant literature.limitations” (p.603). Others have argued that the studies suffered from “stakeholder mismatching” (Wood and Jones, 1995), the neglect of “contingency factors” (e.g. Ullmann, 1985), “measurement errors” (e.g. Waddock and Graves, 1997) and, omitted variable bias (Aupperle, Carrol and Hatfield, 1985; Cochran and Wood, 1984; Ullman, 1985).In this paper, we focus on the impact of CSR on the firm’s capital constraints. By “capital constraints” we refer to market frictions6that may prevent a firm from funding all desired investments. This inability to obtain finance may be “due to credit constraints or inability to borrow, inability to issue equity, dependence on bank loans, or illiquidity of assets” (Lamont et al., 2001). Prior studies have suggested that capital constraints play an important role in strategic decision-making, since they directly affect the firm’s ability to undertake major investment decisions and, also influence the firm’s capital structure choices (e.g., Hennessy and Whited, 2007). Moreover, past research has found that capital constraints are associated with a firm’s subsequent stock returns (e.g. Lamont et al., 2001).There are several reasons why investors would pay attention to a firm’s CSR strategies. First, firm activities that may affect7 long-term financial performance are taken into account by market participants when assessing a firm’s long-run value-creating potential (Ioannou and Serafeim, 2010a; Groysberg et al., 2011; Previts and Bricker, 1994). Moreover, a growing number of investors use CSR information as an important criterion for their investment decisions – what is currently known as “socially responsible investing” (SRI). For example, in 2007 mutual funds that invested in socially responsible firms had assets under management of more than $2.5 and $2 trillion dollars in the United States and Europe respectively. In Canada, Japan 6Consistent with prior literature in corporate finance (e.g. Lamont et al., 2001), we do not use the term to mean financial distress, economic distress or bankruptcy risk.7 Margolis and Walsh (2003) perform a meta-analysis of the CSR studies and find that the link between CSR and financial performance is small, albeit it is positive and statistically significant.and Australia, the corresponding numbers were $500, $100 and $64 billion respectively (Ioannou and Serafeim, 2010a). Total assets under management by socially responsible investors have grown considerably in the last ten years in countries such as the United States, United Kingdom, and Canada. In addition, the emergence of several CSR rankings and ratings firms (such as Thomson Reuters ASSET4 and KLD), the widespread dissemination of data on ESG performance by Bloomberg terminals, as well as the formation of teams to analyze CSR data within large banks such as J.P. Morgan Chase and Deutsche Bank,8highlight the growing demand and subsequent increasing use of CSR information. Furthermore, projects like the Enhanced Analyst Initiative9(EAI) that allocate a minimum of 5 percent of trading commissions to brokers that integrate analysis of CSR data into their mainstream research has further increased investor incentives to incorporate CSR data in their analysis10. Finally, in several countries around the world, governments have adopted laws and regulations that mandate CSR reporting (Ioannou and Serafeim, 2011) as part of efforts to increase the availability of CSR data and bring transparency around nonfinancial performance.The thesis of this paper is that firms with better CSR performance face fewer capital constraints. This is due to several reasons. First, superior CSR performance is directly linked to better stakeholder engagement, which in turn implies that contracting with stakeholders takes place on the basis of mutual trust and cooperation (Jones, 1995). Furthermore, as Jones (1995) argues, “because ethical solutions to commitment problems are more efficient than mechanisms designed to curb opportunism, it follows that firms that contract with their stakeholders on the basis of mutual trust and cooperation […] will experience reduced agency costs, transaction costs8 Cobley, M. 2009. “Banks Cut Back Analysis on Social Responsibility”, The Wall Street Journal, June 11th 2009.9 An initiative established by institutional investors with assets totaling more than US$1 trillion.10 “Universal Ownership: Exploring Opportunities and Challenges”, Conference Report, April 2006, Saint Mary’s College of California, Center for the study of Fiduciary Capitalism and Mercer Investment Consulting.and costs associated with team production” (Foo, 2007). In other words, superior stakeholder engagement may directly limit the likelihood of short-term opportunistic behavior (Benabou and Tirole, 2010; Ioannou and Serafeim, 2011) by reducing overall contracting costs (Jones, 2005).Moreover, firms with better CSR performance are more likely to disclose their CSR activities to the market (Dhaliwal et al., 2011) to signal their long-term focus and differentiate themselves (Spence, 1973; Benabou and Tirole, 2010). In turn, reporting of CSR activities: a) increases transparency around the social and environmental impact of companies, and their governance structure and b) may change internal management practices by creating incentives for companies to better manage their relationships with key stakeholders such as employees, investors, customers, suppliers, regulators, and civil society (Ioannou and Serafeim, 2011). Therefore, the increased availability of data about the firm reduces informational asymmetries between the firm and investors (e.g. Botosan, 1997; Khurana and Raman, 2004; Hail and Leuz, 2006; Chen et al., 2009; El Ghoul et al., 2010), leading to lower capital constraints (Hubbard, 1998).In fact, the rapid growth of available capital through SRI funds in recent years (Ioannou and Serafeim, 2010a), and the corresponding expansion of potential financiers that base their investment decisions on non-financial information (Kapstein, 2001), may well be partially due to the increased transparency and an endorsement of the long-term orientation that firms with superior CSR performance adopt. In sum, because of lower agency costs through stakeholder engagement and increased transparency through nonfinancial reporting, we predict that a firm with superior CSR performance will face lower capital constraints.To investigate the impact that CSR has on capital constraints, we use data from Thompson Reuters ASSET411for 2,439 publicly listed firms during the period 2002 to 2009. Thompson Reuters ASSET4 rates firms’ performance on three dimensions (“pillars”) of CSR: social, environmental and corporate governance. The dependent variable of interest is the “KZ index”, first advocated by Kaplan and Zingales (1997) and subsequently used extensively by scholars (e.g. Lamont et al., 2001; Baker et al., 2003; Almeida et al., 2004; Bakke and Whited, 2010; Hong et al., 2011) as a measure of capital constraints.The results confirm that firms with better CSR performance face lower capital constraints. We test the robustness of the results, by substituting the KZ index with an indicator variable for stock repurchase activity, to proxy for capital constraints, and we find similar results. Importantly, the results remain unchanged when we implement an instrumental variables approach and a simultaneous equations model, mitigating potential endogeneity concerns or correlated omitted variables issues, and providing evidence for a causal argument. Finally, we disaggregate CSR performance into its three components to gain insight as to which pillars have the greatest impact on capital constraints. We find that the result is driven primarily by social and environmental performance.This paper contributes to both the theoretical and empirical literature on CSR. Although many studies have explored the link between CSR and value creation, few have focused on the crucial role that capital markets play as a mechanism through which CSR may translate into tangible benefits for firms (e.g. Derwall and Verwijmeren, 2007; Goss and Roberts, 2011; Sharfman and Fernando, 2008; Chava, 2010). We contribute to this literature by showing the impact that CSR has on the firm’s ability to access finance in capital markets.11ASSET 4 is widely used by investors as a source for environmental, social and governance performance data. Some of the most prominent investment houses in the world, such as BlackRock, use the ASSET 4 data. See: /content/financial/pdf/491304/2011_04_blackrock_appoints_esg.pdfFurthermore, this study sheds light on the core strategic problem: understanding persistent performance heterogeneity across firms in the long-run. We argue that differential ability across firms to implement CSR strategies, results in significant variation in terms of CSR performance which in turn, is directly linked to the firm’s ability to access capital. Differential access to capital implies variation in the ability of firms to finance major strategic investments, leading to direct performance implications in the long-run. In other words, by understanding the consequences of variability in CSR strategies we contribute towards understanding performance heterogeneity across firms in the long-run.The remainder of the paper is organized as follows. Section II discusses the prior literature linking CSR to value creation, and prior literature linking capital constraints with firm performance. Section III presents the theoretical argument and derives our main hypothesis. Section IV presents the data sources and the empirical methods. Section V presents the results and section VI provides a discussion of the findings, the limitations of the study and concludes.II.PRIOR LITERATURECorporate Social Responsibility and Firm PerformanceMany studies have investigated the link between CSR and financial performance, both from a theoretical as well as from an empirical standpoint. On the one hand, prior theoretical work rooted in neoclassical economics argued that CSR unnecessarily raises a firm’s costs, and thus, puts the firm in a position of competitive disadvantage vis-à-vis competitors (Friedman, 1970; Aupperle et al., 1985; McWilliams and Siegel, 1997; Jensen, 2002). Other studies have argued that employing valuable firm resources to engage in socially responsible strategies results insignificant managerial benefits rather than financial benefits to the firm’s shareholders (Brammer and Millington, 2008).On the other hand, several scholars have argued that CSR may have a positive impact on firms by providing better access to valuable resources (Cochran and Wood, 1984; Waddock and Graves, 1997), attracting and retaining higher quality employees (Turban and Greening, 1996; Greening and Turban, 2000), better marketing for products and services (Moskowitz, 1972; Fombrun, 1996), creating unforeseen opportunities (Fombrun et al., 2000), and gaining social legitimacy (Hawn et al., 2011). Furthermore, others have argued that CSR may function in similar ways as advertising does and therefore, increase overall demand for products and services and/or reduce consumer price sensitivity (Dorfman and Steiner, 1954; Navarro, 1988; Sen and Bhattacharya, 2001; Milgrom and Roberts, 1986) as well as enable the firm to develop intangible resources (Gardberg and Fomburn, 2006; Hull and Rothernberg, 2008; Waddock and Graves, 1997). Within stakeholder theory (Freeman, 1984; Freeman et al., 2007; Freeman et al., 2010), which suggests that CSR is synonymous to effective management of multiple stakeholder relationships, scholars have argued that identifying and managing ties with key stakeholders can mitigate the likelihood of negative regulatory, legislative or fiscal action (Freeman, 1984; Berman et al., 1999; Hillman and Keim, 2001), attract socially conscious consumers (Hillman and Keim, 2001), or attract financial resources from socially responsible investors (Kapstein, 2001). CSR may also lead to value creation by protecting and enhancing corporate reputation (Fombrun and Shanley, 1990; Fombrun, 2005; Freeman et al., 2007).Empirical examinations of the link between CSR and corporate financial performance have resulted in contradictory findings, ranging from a positive to a negative relation, to a U-shaped or even to an inverse-U shaped relation (Margolis and Walsh, 2003). According toMcWilliams and Siegel (2000), conflicting results were due to “several important theoretical and empirical limitations” (p.603) of prior studies; others have argued that prior work suffered from “stakeholder mismatching” (Wood and Jones, 1995), the neglect of “contingency factors” (e.g. Ullmann, 1985), “measurement errors” (e.g. Waddock and Graves, 1997) and, omitted variable bias (Aupperle et al.,, 1985; Cochran and Wood, 1984; Ullman, 1985).In this paper, we shed light on the link between CSR and value creation, by focusing on the role of capital markets, as a specific mechanism through which CSR strategies may translate into economic value in the long run. More specifically, we argue that better CSR performance leads to lower capital constraints, which in turn has a positive impact on performance. Accordingly, the following subsection briefly reviews prior literature on the link between capital constraints and firm performance.Capital Constraints and Firm PerformanceFirms undertake strategic investments to achieve competitive advantage and thus, superior performance. The ability of the firms to undertake such investments is, in turn, directly linked to the idiosyncratic capital constraints that the firm is facing. Therefore, to understand the link between capital constraints and performance we first focus on the impact of capital constraints on investments. The theory of investment was shaped by Modigliani and Miller's seminal paper in 1958, which predicted that “a firm's financial status is irrelevant for real investment decisions in a world of perfect and complete capital markets.” The neoclassical economists derived the investment function from the firm's profit-maximizing behavior and showed that investment depends on the marginal productivity of capital, interest rate, and tax rules (Summers et. al. 1981; Mankiw 2009). However, subsequent studies in equity and debtmarkets showed that cash flow (i.e. internal funds) also plays a significant role in determining the level of investment (Blundell et. al. 1990; Whited 1992; Hubbard and Kashyap 1992). Importantly, studies have shown that financially constrained firms are more likely to reduce investments in a broad range of strategic activities (Hubbard, 1998; Campello et al., 2010), including inventory investment (Carpenter et al., 1998) and R&D expenditures (Himmelberg and Petersen, 1994; Hall and Lerner, 2010), thus significantly constraining the capacity of the firm to grow over time.Another set of studies has explored the relation between capital constraints and firm entry and exit decisions. Using entrepreneurs' personal tax-return data, Holtz-Eakin, Joulfaian, and Rosen (1994a) considered inheritance as an exogenous shock on the individual’s wealth and found that the size of the inheritance had a significant effect on the probability of becoming an entrepreneur. A follow-up paper (Holtz-Eakin, Joulfaian, and Rosen 1994b) has shown that firms founded by entrepreneurs with a larger inheritance (thus, lower capital constraints) are more likely to survive. Aghion, Fally and Scarpetta (2007) develop a similar argument by using firm-level data from 16 economies, comparing new firm entry and their post-entry growth trajectory.Another stream of literature, that considers incumbents as well as new entrants, (see Levine (2005) for a review of relevant studies) argues that capital constraints tend to affect relatively more the smaller, newer and riskier firms and channel capital to where the return is highest. As a result, countries with better-functioning financial systems that can ease such constraints, experience faster industrial growth. Given the idiosyncratic levels of constraints faced by companies of various sizes, scholars started to look at capital constraints as an explanation for why small companies pay lower dividends, become more highly levered and grow more slowly (Cooley and Quadrini 2001; Cabral and Mata 2003). For example, Carpenterand Petersen (2002) showed that a firm's asset growth is constrained by internal capital for small U.S. firms, and that firms who are able to raise more external funds enjoy a higher growth rate. Becchetti and Trovato (2002) found comparable results with a sample of Indian firms, and Desai, Foley and Forbes (2008) confirmed the same relation in a currency crisis setting. Finally, Beck et al. (2005), using survey data of a panel of global companies, documented that firm performance is vulnerable to various financial constraints and small companies are disproportionately affected due to tighter limitations. In sum, the literature to date has revealed that seeking ways to relax capital constraints is crucial to the firm-level survival and growth, the industry-level expansion and the country-level development.III.THEORETICAL DEVELOPMENTBased on neoclassical economic assumptions that postulate a flat supply curve for funds in the capital market at the level of the risk-adjusted real interest rate, Hennessy and Whited (2007) argued that “a CFO can neither create nor destroy value through his financing decisions in a world without frictions”. However, because of market imperfections such as informational asymmetries (Greenwald, Stiglitz and Weiss 1984; Myers and Majluf 1984) and agency costs (Bernanke and Gertler 1989, 1990), the supply curve for funds is effectively upward sloping rather than horizontal12 at levels of capital that exceed the firm’s net worth. In other words, when the likelihood of agency costs is high (e.g. opportunistic behavior by managers) and the capital required by the firm for investments exceeds the firm’s net worth (and it is thus uncollateralized), lenders are compensated for their information (and/or monitoring) costs by charging a higher interest rate. The greater these market frictions are, the steeper the supply curve and the higher the cost of external financing.12 For a full exposition of the model, based on neoclassical assumptions, see Hubbard (1998), p. 195-198.It follows then that adoption and implementation of firm strategies that reduce informational asymmetries or reduce the likelihood of agency costs, can shrink the wedge between the external and the internal cost of capital by making the supply curve for funds less steep. Equivalently, for a given interest rate, the firm is able to obtain higher amounts of capital. Better access to capital in turn, favorably impacts overall strategy by enabling the firm to undertake major investment decisions that otherwise would have been unprofitable, and/or by influencing the firm’s capital structure choices (e.g., Hennessy and Whited, 2007).We argue that firms with better CSR performance face lower capital constraints compared to firms with worse CSR performance. This is because superior CSR performance reduces market frictions through two mechanisms. First, superior CSR performance is the result of the firm committing to and contracting with stakeholders on the basis of mutual trust and cooperation (Jones, 1995; Andriof and Waddock, 2002). Evaluating the Impact of Simultaneous Multithreading on Network Servers Using Real HardwareY aoping Ruan yruan@Vivek S.Paivivek@Erich Nahum†nahum@John M.Tracey†traceyj@Department of Computer Science,Princeton University,Princeton,NJ08544†IBM T.J.Watson Research Center,Y orktown Heights,NY10598ABSTRACTThis paper examines the performance of simultaneous multithreading (SMT)for network servers using actual hardware,multiple network server applications,and several ing three versions of the Intel Xeon processor with Hyper-Threading,we perform macroscopic analysis as well as microarchitectural measurements to understand the origins of the performance bottlenecks for SMT processors in these environments.The results of our evaluation suggest that the current SMT support in the Xeon is application and workload sensitive,and may not yield significant benefits for network servers.In general,wefind that enabling SMT on real hardware usually produces only slight performance gains,and can sometimes lead to performance loss.In the uniprocessor case,previous studies appear to have neglected the OS overhead in switching from a uniprocessor kernel to an SMT-enabled kernel.The performance loss associated with such support is comparable to the gains provided by SMT.In the2-way multiprocessor case,the higher number of memory ref-erences from SMT often causes the memory system to become the bottleneck,offsetting any processor utilization gains.This effect is compounded by the growing gap between processor speeds and memory latency.In trying to understand the large gains shown by simulation studies,wefind that while the general trends for micro-architectural behavior agree with real hardware,differences in siz-ing assumptions and performance models yield much more opti-mistic benefits for SMT than we observe.Categories and Subject Descriptors:C.4PERFORMANCE OF SYSTEMS:Design studiesGeneral Terms:Measurement,Performance.Keywords:Network Server,Simultaneous Multithreading(SMT).1.INTRODUCTIONSimultaneous multithreading(SMT)has recently moved from simulation-based research to reality with the advent of commer-cially available SMT-capable microprocessors.Simultaneous multi-threading allows processors to handle multiple instruction streams in the pipeline at the same time,allowing higher functional unit uti-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.SIGMETRICS’05,June6–10,2005,Banff,Alberta,Canada.Copyright2005ACM1-59593-022-1/05/0006...$5.00.lization than is possible from a single stream.Since the hardware support for this extra parallelism seems to be minimal,SMT has the potential to increase system throughput without significantly affecting system cost.While academic research on SMT proces-sors has been taking place since the mid-1990’s[8,37],the recent availability of SMT-capable Intel Xeon processors allows perfor-mance analysts to perform direct measurements of SMT benefits under a wide range of workloads.One of the biggest opportunities for SMT is in network servers, such as Web,FTP,orfile servers,where tasks are naturally paral-lel,and where high throughput is important.While much of the academic focus on SMT has been on scientific or computation-intensive workloads,suitable for the High Performance Computing (HPC)community,a few simulation studies have explicitly exam-ined Web server performance[18,26].The difficulty of simulat-ing server workloads versus HPC workloads is in accurately han-dling operating system(OS)behavior,including device drivers and hardware-generated interrupts.While processor-evaluation work-loads like SPEC CPU[33]explicitly attempt to avoid much OS in-teraction,server workloads,like SPECweb[34]often include much OS,filesystem,and network activity..While simulations clearly provide moreflexibility than actual hardware,evaluation on real hardware also has its advantages,in-cluding more realism and faster ing actual hard-ware,researchers can run a wider range of workloads(e.g.,bottom-half heavy workloads)than is feasible in simulation-based environ-ments.Particularly for workloads with large data set sizes that are slow to reach steady state,the time difference between simulation and evaluation can be substantial.The drawback of hardware,how-ever,is the lack of configuration options that is available in simula-tion.Someflexibility in the hardware analysis can be gained by us-ing processors with different characteristics,though this approach is clearly much more constrained than simulators.This paper makes four contributions:•We provide a thorough experimental evaluation of SMT for network servers,usingfive different software packages and three hardware platforms.We believe this study is more com-plete than any related work previously published.•We show that SMT has a smaller performance benefit than expected for network servers,both in the uniprocessor and dual-processor cases.In each case,we identify the macro-level issues that affect performance.•We perform a microarchitectural evaluation of performance using the Xeon’s hardware performance counters.The re-sults provide insight into the instruction-level issues that af-fect performance on these platforms.•We compare our measurements with earlier simulation re-sults to understand what aspects of the simulated processorsyielded much larger performance gains.We discuss the fea-sibility of these simulation models,both in the context of current hardware,and with respect to expected future trends. Our evaluation suggests that the current SMT support is sensitive to application and workloads,and may not yield significant bene-fits for network servers,especially for OS-heavy workloads.We find that enabling SMT usually produces only slight performance gains,and can sometimes lead to performance loss.In the unipro-cessor case,simulations appear to have neglected the OS overhead in switching from a uniprocessor kernel to an SMT-enabled kernel. The performance loss associated with such support is comparable to the gains provided by SMT.In the2-way multiprocessor case, the higher number of memory references from SMT often causes the memory system to become the bottleneck,offsetting any pro-cessor utilization gains.This effect is compounded by the growing gap between processor speeds and memory latency.Wefind that SMT on the Xeon tends to provide better gains when coupled with large L3caches.By comparing performance gains across variants of the Xeon,we argue that such caches will only become more cru-cial for SMT as clock rates increase.If these caches continue to be one of the differentiating factors between commodity and higher-cost processors,then commodity SMT will see eroding gains going forward.We believe this observation also applies to architectures other than the Xeon,since SMT only yields benefits when it is able to utilize more processor resources.Using these results,we can also examine how simulation sug-gested a much more optimistic scenario for SMT,and why it differs from what we observe.For example,when calculating speedups, none of the simulations used a uniprocessor kernel when measur-ing the non-SMT base case.Furthermore,the simulations use cache sizes that are larger than anything commonly available today.These large caches appear to have supported the higher number of threads used,yielding much higher benefits than what we have seen,even when comparing with the same number of threads.We do not be-lieve that the processor models used in the simulation are simply more aggressive than what is available today or likely to be avail-able in the near-future.Instead,using comparable measurements from the simulations and existing hardware,we show that the type of processors commonly modeled in the simulations are unlikely to ever appear as slightly-modified mainstream processors.We argue that they have characteristics that suggest they could be built specif-ically for SMT,and would sacrifice single-thread performance. The rest of this paper is organized as follows:we provide some background on SMT,the Xeon,and our experimental setup in Sec-tion2.We measure SMT’s effect on throughput and perform a microarchitectural analysis in Sections3and4.In Section5we compare our measurement results to previous simulation studies. The impact of other workloads is discussed in Section6.Section7 discusses related work,and we conclude in Section8.2.BACKGROUNDIn this section we present an overview of the Intel Xeon proces-sor with Hyper-Threading(Intel’s term for SMT),then describe our experimental platform including hardware parameters and server configuration,our workloads and measurement methodology. 2.1SMT ArchitectureThe SMT architecture was proposed in the mid-1990’s,and has been an active area for academic research since that time[16,36, 37],but thefirst general-purpose processor with SMT features was not shipped until2003.The main intent of SMT is to convert thread-level parallelism into instruction-level parallelism.In SMT-clock ratepipelineFetchround robin for logical processors3µops per cyclecaches,branch predictors,decoder logic ResourcesDuplicatedITLB,renaming logicµop queue,re-ordering buffer ResourcesLevel Associa-Latencytivity(cycles)12Kµops6µopsD-L14way2512KB128bytesMemory N/A225-344DTLB64entries,20cycles miss penaltyTable2:Intel Xeon memory hierarchy information.The latency cy-cles of each level of the memory hierarchy includes the cache miss time of the previous levelabsolute time.The absolute latency is relatively constant since the FSB speed is the same.The impact on bandwidth is22%,much less than the clock speed difference–the2.0GHz system has a read bandwidth of1.8GB/sec while the3.06GHz system has a value of2.2GB/sec.While higher bandwidth is useful for copy-intensive applications,the memory latency is more important to applications that perform heavy pointer-chasing.Early Web servers performed significant numbers of memory copies to transfer data, but with the introduction of zero-copy[22]support into servers, copy bandwidth is less of an issue.Our testing harness consists of12uniprocessor client machines with AMD Duron processors at1.6GHz.The aggregate processor power of the clients are enough to ensure that the clients are never the bottleneck.To ensure adequate network bandwidth,the clients are partitioned into four groups of three machines.Each group is connected to the server via a separate switched Gigabit Ethernet, using four Intel e1000MT server adapters at the server.We comparefive different OS/processor configurations,based on whether a uniprocessor or multiprocessor kernel is used,and whether SMT is enabled or ing the BIOS support and OS boot parameters,we can select between one or two proces-sors,and enable or disable SMT.For most of our tests,we use a multiprocessor-enabled(SMP)kernel,since the OS sees an SMT-enabled processor as two logical processors.However,when we run with one physical processor and SMT disabled,we also test on a uniprocessor kernel.These combinations yield thefive configura-tions studied in this paper:one processor with uniprocessor kernel (1T-UP),one processor with SMP kernel(1T-SMP),one proces-sor with SMP kernel and SMT enabled(2T),two processors(2P), and two processors with SMT enabled(4T).Key features of the five configuration and their names used in this paper are shown in Table3.The operating system on the server is Linux,with ker-nel version2.6.8.1.This version includes optimizations for SMT, which we enable.The optimizations are described next.2.3Kernel Versions and OverheadsIn evaluating SMT performance on uniprocessors,it is important to understand the distinction between the types of kernels avail-able,because they affect the delivered performance.Uniprocessor kernels,as the name implies,are configured to only support one processor,regardless of how many physical processors are in the system.Multiprocessor kernels are configured to take advantage of all processors in the system using a single binary image.While in-tended for multiple processors,they are designed to operate without problems on a single processor.Uniprocessor kernels can make assumptions about what is pos-sible during execution,since all sources of activity are taking place on one processor.Specifically,the OS can make two important as-1T-SMP2P #CPUs12 SMP kernel Yes Yes SMT enabled No NoTUX[38],and Haboob[40].Each server has one or more distin-guishing features which increases the range of systems we study. All of the servers are written in C,except Haboob,which uses Java. TUX is in-kernel,while all of the others are user-space.Flash and Haboob are event-driven,but Haboob also uses threads to isolate different steps of request processing.We run Apache in two config-urations–with multiple-processes(dubbed Apache-MP),and mul-tiple threads(dubbed Apache-MT)using Linux kernel threads,be-cause the Linux2.6kernel has better support for threads than the 2.4series,and the Xeon has different cache sharing for threaded applications.Threaded applications share the same address space register while multi-process applications usually have different reg-isters.Flash has a main process handling most of the work with helpers for disk IO access.We run the same number of Flash main processes as the number of hardware contexts.TUX uses a thread-pool model,where multiple threads handle ready events.With the exception of Haboob,all of the servers use the zero-copy interfaces available on Linux,reducing memory copy overhead when send-ing largefiles.For all of the servers,we take steps described in the literature to optimize their performance.While performance comparison among the servers is not the focus of this paper,we are interested in examining performance characteristics of SMT on these different software styles.We use the SPECweb96[34]benchmark mostly because it was used in previous simulation pared to its successor,the SPECweb99benchmark,it spends more time in the kernel because all requests are static,which resembles other server workloads such as FTP andfile servers.We also include SPECweb99benchmark results for comparison.SPECweb is intended to measure a self-scaling capacity metric,which means that the workload character-istics change in several dimensions for different load levels.To simplify this benchmark while retaining many of its desirable properties,we use a more tractable subset when measuring band-widths.In particular,wefix the data set size of the workload to 500MB,whichfits in the physical memory of our machine.We perform measurements only after an initial warm-up phase,to en-sure that all necessaryfiles have been loaded into memory.During the bandwidth tests,no disk activity is expected to occur.We dis-able logging,which causes significant performance losses in some servers.SPECweb99measures the number of simultaneous con-nections each server is able to sustain while providing the speci-fied quality of service to each connection.The SPECweb99client software introduces latency between requests to decrease the per-connection bandwidth.SPECweb96does not have this latency,al-lowing all clients to issue requests in a closed loop,infinite-demand model.We use1024simultaneous connections,and report the ag-gregate response bandwidth received by the clients.We use a modified version of OProfile[20]to measure the uti-lization of microarchitectural resources via the Xeon’s performance-monitoring events.OProfile ships with the Linux kernel and is able to report user,kernel or aggregated event values.OProfile operates similarly to DCPI[1],using interrupt-based statistical sampling of event counters to determine processor activity without much over-head.Wefind that for our experiments,the measurement overhead is generally less than1%.While OProfile supports many event counts available on the Xeon,we enhance the released code to sup-port several new events,such as L1data cache miss,DTLB miss, memory loads,memory stores,resource stalls,etc.3.SMT PERFORMANCEIn this section we evaluate the throughput improvement of SMT in both uniprocessor and multiprocessor systems.Particular atten-tion is given to the comparison between configurations with and without SMT enabled,and kernels with and without multiproces-sor support.Wefirst analyze trends at a macroscopic level,and then use microarchitectural information to understand what is causing the macroscopic behavior.Our bandwidth result for the basic3.06 GHz Xeon,showingfive servers andfive OS/processor configura-tions,can be seen in Figure2.Results for2.0GHz and3.06GHz with L3cache are seen in Figures1and3,respectively.For each server,thefive bars indicate the maximum throughput achieved us-ing the specified number of processors and OS configuration. While bandwidth is influenced by both the server software as well as the OS/processor configuration,the server software usually has a large effect(and in this case,dominant effect)on bandwidth. Heavily-optimized servers like Flash and TUX are expected to out-perform Apache,which is designed forflexibility and portability instead of raw performance.The relative performance of Apache, Flash,and Haboob is in-line with previous studies[28].TUX’s relative performance is somewhat surprising,since we assumed an in-kernel server would beat all other options.To ensure it was be-ing run correctly,we consulted with its author to ensure that it was properly configured for maximum performance.We surmise that its performance is due to its emphasis on dynamic content,which is not exercised in this portion of our testing.Haboob’s low per-formance can be attributed both to its use of Java as well as its lack of support for Linux’s sendfile system call(and as a result, TCP checksum offload).For in-memory workloads,the CPU is at full utilization,so the extra copying,checksumming,and language-related overheads consume processor cycles that could otherwise be spent processing other requests.3.1SMP Overhead on UniprocessorWe can quantify the overhead of supporting an SMP-capable ker-nel by comparing the1T-UP(one processor,uniprocessor kernel) value with the1T-SMP(one processor,SMP kernel)value.The loss from uniprocessor kernel to SMP kernel on the base3.06GHz processor is10%for Apache,and13%for Flash and Tux.The losses on the L3-equipped processor and the2.0GHz processor are14%for Apache and18%for Flash and Tux,which are a little higher than our base system.The impact on Haboob is relatively low(4%-10%),because it performs the most non-kernel work.The magnitude of the overhead is fairly large,even though Linux has a reputation of being efficient for low-degree SMP configurations. This result suggests that,for uniprocessors,the performance gained from selecting the uniprocessor kernel instead of SMP kernel can be significant for these applications.The fact that the impacts are larger for both the slowest processor and the processor with L3are also interesting.However,if we con-sider these results in context,it can be explained.The extra over-heads of SMP are not only the extra instructions,but also the extra uncacheable data reads and writes for the locks.The fastest system gets its performance boost from its L3cache,which makes the main memory seem closer to the processor.However,the L3provides no benefit for synchronization traffic,so the performance loss is more pronounced.For the slowest processor,the extra instructions are an issue when the processor is running at only two-thirds the speed of the others.3.2Uniprocessor SMT BenefitsUnderstanding the benefits of SMT for uniprocessors is a little more complicated,because it must be compared against a base case.If we compare1T-SMP to2T(uniprocessor SMT),the re-sulting graphs would appear to make a great case for SMT,with speedups in the25%-35%range for Apache,Flash and TUX,as shown in Figure4.However,if we compare the2T performanceFigure 1:Throughput of Xeon 2.0GHz pro-cessor without L3cacheFigure 2:Throughput of base Xeon 3.06GHzprocessorFigure 3:Throughput of Xeon 3.06GHz pro-cessor with 1MB L3cacheFigure 4:SMT speedup on uniprocessor sys-tem with SMP kernel(1)(NonHaltedCycles∗BusSpeed)The bus utilization values,broken down by server software,con-figuration,and processor type,are shown in Figure7.Severalfirst-order trends are visible:bus utilization tends to increase as the num-ber of contexts/processors is increased,is comparable for all servers except Haboob,and is only slightly lower for L3-equipped pro-cessors.The trends can be explained using the observations from the bandwidth study,and provide strong evidence for our analysis about what causes bottlenecks.The increased bus utilization for a given processor type as the number of processors and hardware contexts increase is not sur-prising,and is similar in pattern to the throughput behavior.Essen-tially,if the system is work-conserving,we expect bus utilization to be correlated with the throughput level.In fact,we see this pattern for the gain from the2.0GHz processor to3.06GHz–the coeffi-cient of correlation between the throughput and the bus utilization is0.95.The coefficient for the L3-equipped versus base3.06GHz Xeon is only0.62,which is still high,and provides evidence that the L3cache is definitely affecting the memory traffic.A more complete explanation of the L3results are provided below.The fact that Haboob’s bus utilization looks different from others is explained by its lack of zero-copy support,and in turn explains its relatively odd behavior in Figures5and6.The bulk data copy-ing that occurs duringfile transfers will increase the bus utilization for Haboob,since the processor is involved in copying buffers and performing TCP checksums.However,the absolute utilization val-ues mask a much larger difference–while Haboob’s bus utilization is roughly50%higher than that of Flash or TUX,its throughput is one-half to one-third the value achieved by those bin-ing thosefigures,we see that Haboob has a per-request bus utiliza-tion that is three to four times higher than the other servers.The same explanation applies to the bus utilization for the L3-equipped processors,and to Apache’s relative gain from SMT.The L3cache absorbs memory traffic,reducing bus utilization,but for Flash and TUX,the L3numbers are only slightly below the non-L3 numbers.However,the absolute throughput for the L3-equipped processors are as much as50%higher,indicating that the per-request bus utilization has actually dropped.The differences in bus utilization then provide some insight into what is happening.For Flash and TUX,the L3bus utilizations are very similar to the non-L3values,suggesting that the request throughput increases until the memory system again becomes the bottleneck.For Apache,the L3utilization is lower than the non-L3,suggesting that while the memory system is a bottleneck without the L3cache,somethingFigure10:L1data cache miss rate Figure8:Cycles per micro-op(CPµ)Figure9:L1instruction cache(Trace Cache)miss rateFigure12:Instruction TLB miss rate Figure13:Data TLB miss rate Figure11:L2cache miss rate,including bothinstruction and dataApache-MT TuxµPB13.0 6.0IPB7.1 3.4Figure 14:Branch mispredictionrate Figure 15:Trace delivery enginestalls Figure 16:Stalls due to lack of storebuffersFigure 17:#of pipeline clears perbyteFigure 18:#of aliasing conflicts per bytesharing.In comparing Apache-MT to Apache-MP,we do see some reduction in the 4T L1miss rate,but the miss rate is still higher than the 2P cases.Thus,while the multithreaded code helps reduce the pressure,the SMT ICache pressure is still significant.The L2miss rate drops in all cases when SMT is enabled,indicating that the two contexts are reinforcing each other.The relatively high L2miss rate for TUX is due to its lower L1ICache miss rate –in absolute terms,TUX has a lower number of L2accesses.The interactions on CPI are complex –the improved L2miss rates can reduce the impact of main memory,but the much worse L1miss rates can inflate the impact of L2access times.We show the breakdowns later when calculating overall CPI values.•TLB misses.In the current Xeon processor,the Instruction Translation Lookaside Buffer (ITLB)is duplicated and the shared DTLB is tagged with each logical processor’s ID.Enabling SMT drops the ITLB miss rate (shown in Figure 12)while increasing the DTLB miss rate (shown in Figure 13).The DTLB miss rate is expected,since the threads may be operating in different regions of the code.We believe the drop in ITLB stems from the interrupt handling code executing only on the first logical processor,effec-tively halving its ITLB footprint.•Mispredicted branches.Branches comprise 15%-17%of in-structions in our applications.Each mispredicted branch has a 20cycle penalty.Even though all of the five servers show 50%higher misprediction rates with SMT,the overall cost is not significant compared to cache misses,as we show in the breakdowns later.•Instruction delivery stalls.The cache misses and mispredicted branches result in instruction delivery stalls.This event measures the number of cycles halted when there are no instructions ready to issue.Figure 15shows the average cycles stalled for each byte delivered.For each server,we observe a steady increase from 1T-UP to 4T,suggesting that with more hardware contexts,the number of cycles spent stalled increases.•Resource Stalls.While the value of instruction delivery stalls measures performance in the front-end of the pipeline,stalls may also occur during pipeline execution stages.This event measures the occurrence of stalls in the allocator caused by store buffer re-strictions.In the Xeon,buffers between major pipeline stages are partitioned when SMT is enabled.Figure 16shows cycles stalled per byte due to lack of the store buffer in the allocator.Enabling SMT exhibits a doubling of the number of stall cycles for each byte transfered.Unfortunately,stalls due to other buffer conflicts,such as the renaming buffer,are not available on existing performance-monitoring counters.We expect similar pressure is also seen in other buffers.•Pipeline clears.Due to the Xeon’s design,there are conditions in which all non-retiring stages of the pipeline need to be cancelled.This event measures the number of these flushes.When this hap-pens,all of the execution resources are idle while the clear occurs.Figure 17shows the average number of pipeline clears per byte of content.The SMT rate is a factor of 4higher,suggesting that pipeline clears caused by one thread can affect other threads exe-cuting simultaneously.Profiling on this event indicates that more than 70%are caused by interrupts.Haboob’s high clear rate in 4T mode may be responsible for some of its performance degradation.•64K aliasing conflicts.This event occurs when the address of a load or store conflicts with another reference which is in progress.When this happens,the second reference cannot begin until the first one is evicted from the cache.This type of conflict exists in the first-level cache and may incur significant penalties for loads that alias to preceding stores.The number of conflicts per byte is shown in Figure 18.All of the servers show fairly high number of con-flicts,suggesting an effective direction for further optimization.•Putting cycles together.We estimate the aggregated cycles per instruction of these negative events and compare them to the mea-sured CPI.While it is possible to estimate the penalty of each event,。