The Program for Research on Private Higher Education
Dedicated to Building Knowledge about Private Higher Education around the World

Guide to the PROPHE Dataset

06/26/2018

The following aims to inform the interested reader about PROPHE’s dataset. Four sections lay out the necessary information, identifying the dataset’s scope and contours. More specifically, the dataset is an enrollment dataset but since it is PROPHE’s sole comprehensive and highly structured dataset, we can avoid innumerable repetition of the word enrollment.

      A. PROPHE’s Dataset Development; Beyond UIS

      B. Data Substitution Guidelines

      C. Inclusivity

      D. Regional Categorizations

 

A. PROPHE’s Dataset Development: Beyond UIS 

“N” of countries. PROPHE’s dataset covers 210 countries. As 18 lack any higher education data, however, the effective “N” for data analysis is 192.

Countries not showing private higher education (PHE). PROPHE retains in its 192-country dataset 13 countries that report no PHE data (which is quite different from reporting PHE as zero, a real number). The 13 instead show just a total enrollment, which could be all public, or show just a public enrollment, which could be the country’s total. But because probably some of the 13 in fact have (un-shown) PHE, to include the 13 is to underestimate the global private share (private enrollment/ total enrollment). The numerator of private enrollment remains the same while the denominator of total enrollment enlarges. Only minimally, however, as Table 1 suggests, if we were to delete the 13 countries with their 672,821 enrollments, PHE’s global average would rise only 0.1%, to 33.0%.

 Table 1. Private Share of Total Global Enrollment: All Countries vs. Countries Showing Sectoral Data

  2010
  Private % Private Total
Global 32.9 56,722,374 172,546,175
Global with PP distinction 33.0 56,722,374 171,873,354

 

UIS coverage. The chief source from which PROPHE could build a dataset strong in reliability, inclusiveness, and comparability had to be UNESCO’s Institute of Statistics (UIS), the only international agency gathering enrollment data by sector. [1] Since 1999, the UIS has annually solicited and displayed “tertiary education” data, aiming to include all its “levels,” 5-8. [2] UIS continues to add countries to its list but this has little practical effect for global and regional analysis as these are generally very small countries and indeed the large majority reports no data at all on higher education. [3]

Longitudinal coverage. Longitudinally, PROPHE’s core coverage is the new century’s first decade, 2000-2010 but it has not yet fully analyzed or posted the pre-2010 data. PROPHE also tracks the post-2010 years but it takes some time before UIS data achieve broad international coverage, as countries vary in how promptly they report. Moreover, for several years beyond a year in question, the UIS often issues revised data, presumably more inclusive and accurate. (Separately, PROPHE has datasets on some individual countries at www.prophe.org/en/data-laws/.)

UIS problems. The UIS data can be only as good as what countries’ designated official offices give them, which in turn depends on what the offices can obtain from individual higher education institutions. For a variety of reasons ranging from limited technical capacity to political interests for different institutions to inflate or deflate enrollment numbers in either sector, reliability varies by country, largely in accord with their general development level. Some countries gather no PHE data, others no data for higher education generally, whether for particular years or always. Even when data appear complete we cannot know how UIS’ distributed guidelines to countries about what is “tertiary” education or “public” or “private” are being understood or followed in particular countries. All this no doubt contributes to data inconsistencies, a significant caveat when it comes to cross-national and cross-regional analysis.

Improving over UIS. Using mostly UIS data, PROPHE inherits several of these “genetic” difficulties. But it also takes significant measures to overcome some UIS limitations. The specifics are laid out in section B just below but they allow us to explore for data where UIS either provides none or provides dubious data. These improvements are essential in allowing PROPHE to build its 192-country dataset solidly.

Comparative risks. Data substitutions generally involve risk of distortion—sometimes a virtual guarantee of at least small distortion. We do not substitute because substitutions are perfect. We substitute because leaving missing data or inserting obviously false data is more often worse. Blank boxes not only provide no information for a country in whatever years, but also cripple us in assessing a country’s longitudinal change. Especially in large countries, blank boxes can significantly distort sub-regional, regional, and even global calculations both at snapshot points in time and longitudinally. However, where we focus on 2010 alone we largely escape longitudinal problems and, additionally, our data are stronger for 2010 than for earlier years, with only rare need to resort to estimation for 2010 figures.

 

B. Data Substitution Guidelines

Overview. Four main and specific data substitution guidelines lay out the circumstances under which PROPHE turns to non-UIS data and how it does so. The guidelines apply mostly, but not always fully, in sequential order. More importantly, sequential order does not reflect the weight of each of our guidelines. #3 probably has the biggest impact on our dataset; #2 itself with two components, probably the next biggest. Post-2010, as UIS has fewer blanks in its data on large countries especially, #3 should recede some in importance. After presenting the guidelines, we will show the frequency of their use (see Table 2 below). For particular situations not covered by these four guidelines, PROPHE composes individual notes that explain its choices. (These notes appear with the data tables.) Instances warranting a separate note include where data sources show only “total” or “public” without identifying a distinction between the two and where we calculate private enrollment from one source providing total enrollment and another source providing private shares.

1. OECD and EuroStat. When UIS data are missing or problematic, we sometimes find handy help from European-based organizations. OECD StatExtracts (http://stats.oecd.org/Index.aspx?DatasetCode=RENRL) or EuroStat (http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=educ_enrl1at&lang=en). The rationale for such substitution is that UIS, OECD, and EuroStat jointly administer the “UOE” (UNESCO-UIS/OECD/EUROSTAT) data collection and thus in theory should have the same data source in most years, accordingly allowing for insertions that are consistent with any UIS data we do show.

2. Data from Other Years. When no figure is available for a given year but is for another year or years, we have two ways to substitute:

2a. Calculating the Compound Growth Rate. Where figures are available for at least two years, and neither figure is 0, we estimate the missing year’s figure by interpolating or extrapolating based on the compound annual growth rate implied by the two most proximate years. (Estimation is done for both private share and total enrollment where necessary.)

2b. Simple Substitution. However, we cannot interpolate or extrapolate when only one figure is available; similarly, we cannot do so when our prior year’s value is 0 (see guideline #4 below on when our ensuing year’s PHE value is 0). We therefore substitute the most proximate figure for the missing year entry, with no adjustment. Simple substitution obviously risks greater distortion when the years are far removed. Fortunately, the most frequent substitution comes from merely one year removed, followed by two years removed. Simple substitution presents an automatic longitudinal problem, as replicating a figure necessarily precludes showing any change, thus exaggerating stability. At an extreme, a country for which there is data on only one year would appear to experience no change from 2000 to 2010. Likewise, and especially for large countries, simple substitutions for countries will under-state the amount of sub-regional and regional change.

3. National Data. We turn to data from national (or regional) sources under two circumstances: (a) UIS does not provide data (and neither do the European organizations); (b) expert judgment is that the UIS data in question are far from accurate. (A special case of (b) involves keeping the international agency figure but altering its private/public categorization.) Where national data seem out of line with UIS data from other years, we report our determination in an individual note. A plain risk in using national data is that individual countries can define and count “higher education” differently, leaving international inconsistency in what is included and excluded. But, as noted above, although UIS and other international agencies try to guide countries to categorize, count, and report in common ways, in reality they too depend on what the countries give them. A vivid illustration of the value of guideline #3 is that UIS wrongly counts as private the public enrollment in the 16th largest system (UK), in turn leading some agencies and scholars to put for Europe overall a very inflated private share. Likewise, a major example is that through 2010 UIS, while showing the total enrollment, failed to show the private share in what are now the world’s two largest higher education systems (China and India). Only since 2013 has the UIS shown their private enrollment.

4. Reasoning to 0 for PHE. Where our source shows data for total higher education but not PHE, yet shows 0 PHE enrollment in subsequent years, we use the higher education total for the given year and put PHE as 0. The logic is that as a rule PHE would not yet have emerged in the earlier year if it is absent in a later year and we would know any exceptions in which existing PHE has been abolished—almost surely never in our dataset’s time period.

Substitution frequency. How often, then, does our dataset draw on these data substitution guidelines? Often, as Table 2 shows. Tallying all 3 years, data was generated 460 times. That is 460 times where we would otherwise have had no data or less reliable data for either the private or total values. Clearly the guidelines play a major role in building a formidable dataset. As to relative frequency, substitution is a little more common for total rather than private values. Longitudinally, the use of data substitution guidelines predictably diminishes across time, as more countries provide usable data, but even in 2010 the guidelines allow us to fill in 122 boxes. Each of the guidelines makes a notable contribution. #1 makes the smallest contribution but is also probably the least risky guideline to invoke. #2a, #2b and #3 are each much used for both private and total, throughout the period.

Table 2. Frequency of Data Substitution Guideline Usage 

Guideline Private 2010 Total 2010
#1 1 1
#2a 18 35
#2b 19 10
#3 13 12
#4 13 0
Total 64 58

Note: The “private” count always applies to both private number and private %.

 

C. Inclusivity: Counting Private and Public

Operationalized definition. To say PROPHE counts as private all enrollment that is private begs the question of what is private. This definitional question commands much analysis, across policy fields and historical periods. But for counting purposes our rule is simple: whatever is legally treated as private within a given country. This has been the rule usually employed in the PHE literature, whether explicitly or not. [4] It dovetails well with government or other nationally collected data used domestically and provided to international organizations. An admittedly inherent problem lies in the latitude for any country to label and treat as private something not taken as private in another country. As noted above, UIS attempts to mitigate variation but some variation probably exists, just as it does in how countries define and count tertiary education. Freak cases arise with institutions like “public Catholic” universities. It appears, however, that the degree of variation in usage is not large, just as it was not when Organization of American States (OAS) private-public data across Latin American countries was used in the past.

Sweeping coverage. Our operational definition favors inclusivity. As scholarship increasingly documents, private sectors are internally often rife with contrasting subsectors, types, and forms. Elite or non-elite, secular or religious, large or small, nonprofit or for-profit, cross-border or domestic only, freestanding or sites within broader networks or chains, all fit as long as they are officially counted as private in their country. The same holds even where countries use semi-synonyms for political or other reasons, as with nonpublic or people-run. Operationally, we count as private the main sector other than public, even if the official term for that second sector is a euphemism.

Privateness. A corollary is that we do not act according to “how private” sectors or their component parts “really are.” Degree of privateness is not determinative for definition, labeling, or counting. At an extreme a private university might have less privateness than a public university does, especially when we compare across countries. Degree as well as shape of privateness are of course crucial to many analytical concerns, and often to policy concerns. But how much privateness exists among legally private institutions is an empirical question properly addressed through scholarship. For counting we are straightforward as possible.

Government-dependent. Inclusion of “government-dependent privates” illustrates how we define, count, and include. Government-dependent is a formal label used by European data-collecting agencies and the UIS. Juxtaposed to “independent privates,” they generally have much less privateness but there is fuzziness on definition and operationalization. [5] Moreover, UIS does not break out the numbers between the two private categories, though other organizations at least sometimes do so and we thus have been able to discover that the large majority of private enrollment is in fact independent private; for EU countries, 2009, the private total share is 15.6% while the private independent share is 12.0%. [6] Regardless the decisive reason for the PROPHE dataset to include the government-dependent institutions as private is that official usage counts them as such.

Easily the most common reality and reporting of government-dependent PHE comes in Europe, East and West. Even there, however, the dependent enrollment appears significant in only a few countries, including Belgium, Estonia, and Latvia. The one major country UIS lists as having by far the largest government-dependent sector—100% in the UK—vanishes through the aforementioned PROPHE guideline on national data substitution. [7] The only other country for which PROPHE changes UIS government-dependent private to public is Israel. On the other hand, the category proves analytically powerful in the Indian case even though it is not in the parlance there. (See notes on each of these three countries: the UK, Israel, and India.)

 

D. Regional Identification

Overview. Regional categorizations are explained in 2 consecutive parts: (a) regionalization and sub-regionalization, and (b) development identification.

Regionalization and Sub-regionalization

7 regions. The PROPHE dataset covers seven regions: Africa (Sub-Saharan), Arab, Asia, Developed British Commonwealth, Europe, Latin America, and the US. Two of these regions are sub-regionalized: Europe is divided into East/Central and West, Asia into Central/West, East, Pacific Islands, South, and Southeast.

Beyond UIS classification. There is no one definitive categorization of the world’s regions or sub-regions. Any will have vulnerabilities, none will garner consensus. Like our dataset, our regional categorization starts with the UIS and UNESCO and then modifies. They show 7 regions (http://www.uis.unesco.org/Education/Documents/uis-regions-2012.pdf) We retain 3 intact: Africa (Sub-Saharan), Arab, and Latin America and the Caribbean. Their other 5 regions are: North America and Western Europe, Central and Eastern Europe, and three sub-regions in Asia: Central Asia, East Asia and the Pacific, and South and West Asia. With both common usage and higher education history and development level in mind, we were uncomfortable with some groupings. We designate Europe and Asia their own conventional places as regions, and that involves not attaching either of them to other regions. We reason that by prominence and uniqueness the US should stand alone, not attached to Europe or Canada. Granted “Developed British Commonwealth” is an invented ‘region’ and a small one, but it too reflects historical roots.

Sub-regions. While we insist on the “unification” of Europe and Asia, respectively, we see good reason to then divide each into sub-regions. The East/Central versus West split is compelling through post-war history overall and related bifurcation of higher education’s postwar realities. For Asia, sub-regions seem imperative given the region’s unmatched size and variation. Admittedly our 5 sub-regions leave us with a tiny sub-region (Pacific Islands) and the anomaly of having developed Japan and South Korea in the same sub-region, East Asia, as developing countries; but we give scant attention to the Pacific Islands and we take care to breakout Japan and South Korea when analyzing differences between the developed and developing worlds.

Listing 7 regions and 7 sub-regions. Thus, PROPHE’s 7 regions, with the 7 sub-regions (of the 2 sub-regionalized regions) are:

  1. Africa (Sub-Saharan)
  2. Arab
  3. Asia (with sub-regions of Central and West Asia, East Asia, Pacific Islands, South Asia, and Southeast Asia)
  4. Developed British Commonwealth
  5. Europe (with sub-regions of Central/ East and West)
  6. Latin America and the Caribbean
  7. US

Changes from UIS. The specific changes from the UIS/UNESCO regional categorization to the PROPHE regions and sub-regions are:

  1. UIS’ North America and Western Europe: U.S. taken out and stands alone; Canada taken out and put into Developed British Commonwealth; Western Europe merged with Eastern Europe as one region with two sub-regions.
  2. Created Developed British Commonwealth category: Canada joined by Australia, New Zealand and Tokelau (all taken from UIS East Asia and the Pacific).
  3. UIS’ Central Asia joined with Iran to form Central and Western Asia.
  4. UIS’ South and West Asia changed to South Asia after moving Iran into Central and Western Asia.
  5. UIS’ East Asia and the Pacific split into 3 sub-regions: East Asia, Southeast Asia (ASEAN countries), and Pacific Island Countries, and with the moving of Australia, New Zealand and Tokelau to Developed British Commonwealth.

Possible further sub-regionalization. However sound the reasons for sub-regionalization of Asia and Europe, they do not preclude exploring sub-regionalization in other regions. Africa could be divided into East and West but more likely by language into Anglophone, Francophone, and Lusophone, as they reflect different colonial seeds. The Arab region could also be divided by British or French colonial roots. Latin America could be divided into Mexico, Central America, and the Caribbean, on the one hand, and South America on the other, possibly dividing the latter into Andean and Southern Cone; or again to reflect different colonial roots “Spanish America” and Brazil. But Latin America is less often divided along such lines than Africa is, and both Africa and the Arab regions are comparatively small for sub-regionalization. The Developed British Commonwealth has too few countries to warrant division. And whereas there is ample precedent for dividing the U.S. national case by its own states or regions, for global analysis we must accept the US as a sole entity.

Country placements. However, one categorizes regions and sub-regions, questions remain on the placement of individual countries into those categories. Israel and Turkey are examples, either part of Europe or its neighbors. Yet very few countries present regional placement dilemmas if we follow geography (as PROPHE does other than in Developed British Commonwealth, and as the UIS does not do. On sub-regions, the East-West divide is rather clear for Europe whereas country placement into Asia’s sub-regions is admittedly more problematic. When it comes to country groupings into regions and sub-regions the claim is not to objective superiority but to reasonable decisions within the mainstream.

Development Identification

Development levels classification. Different international agencies use different but often largely similar categories to represent development levels. [8] As with categorization by region and sub-region, so with development level, one can quarrel with any categorization; no claim is made here that PROPHE’s is superior, only that it is viable within the mainstream. It is helpful if we can place each of our 7 regions into one or the other category. We can, and we do with the single exception, noted above, of moving Japan and South Korea to the developed group, counting the rest of Asia in the developing group; but Singapore and Brunei now also raise questions about development level, as might Taiwan, Hong Kong, and Macao if listed as separate (“country”) entities. Categorization is likewise blurry when it comes to the poorer countries of Eastern and Central Europe. Obviously a country’s level of development can change from one time period to another, presenting challenges for longitudinal categorization. Furthermore, as is also common with even reasonable categories, any number of individual countries placed into one category could reasonably be seen in a different category. But we are opting for the advantages of limiting the number of categories to just two (developed and developing) from seven regions, and thus accepting the weaknesses of categorizing together rather varied entities

Population vs enrollment. Basic confirmation of our development designations comes from comparing population shares to total enrollment shares. Developing regions would have low enrollment to population ratios, developed ones the reverse. In fact, six of our seven regions fit their development designation, the stark exception being Latin America, and the Arab region’s fits only weakly. But the fit is shown very powerfully on the developed end by the US, Europe, and British Commonwealth and the developing end by Africa. Moreover, Asia’s fit is tight where shift Japan and Korea from (“developing”) Asia overall to the developed world side.


[1] In fact, UIS shows only private share and total enrollments and from these PROPHE calculates private enrollment and private share. The World Bank simply shows almost the same UIS data in slightly different form, acknowledging the UIS as its source. UIS data downloaded from UIS on 6/14/2012. http://unstats.un.org/unsd/methods/m49/m49alpha.htm. The UIS actually listed 209 countries, but PROPHE’s addition of Kosovo, which the UN does not recognize, makes for our dataset of 210 (which will eventually become 211 when we add Taiwan).

[2] Level 6 corresponds most closely to conventional usage of higher education, usually university, while levels 7 and 8 correspond respectively to Master’s and Doctoral studies. Level 5 is non-university, short-cycle tertiary, normally requiring a secondary school degree. But tertiary excludes level 4, further education that adds something after or beyond upper secondary education.

[3] By 2016, UIS listed 241 countries, adding 32 to those PROPHE had downloaded in 2012 http://unstats.un.org/unsd/methods/m49/m49alpha.htm. Of those 32 countries—most little known by name—only 3 provide any higher education data, only one with dual-sector data.

[4] Levy, Daniel C. 1986. Higher Education and the State in Latin America: Private Challenges to Public Dominance. Chicago: University of Chicago Press.

[5] The organizations do not always use a uniform definition and leave ambiguity on how to operationalize it. There are both financial and governance components to the definitions but no mention of how to tally when one component points one way, the second the other way. Moreover, even within each of the two components are sub-components that need not point in the same direction. On the financial side one criterion is an institution’s receipt of at least 50% of its core funding from government agencies while another is that its teaching personnel be paid by a government agency, whether via the institution or directly. The governance criteria are further complicated by providing no clue as to how to quantify them. see http://uis.unesco.org/en/glossary-term/government-dependent-private-institution).

[6] Levy, Daniel C. 2012.How Important Is Private Higher Education in Europe? A Regional Analysis in Global Context.” European Journal of Education 47 (2): 178–97.

[7] With the U.K. case in mind as an errant UIS classification as 100% private (see table note on the UK), we could be suspicious that UIS might mistakenly put 100% private on other systems as well. It turns out, however, that instances are very few—to date, leaving aside the UK, only 2 countries appear as 100% private, 2010, with total enrollment under 1000.

[8] The UN classifies countries into one of three broad categories: developed economies, economies in transition, and developing economies, with also a separate designation for least developed countries. According to UN classification in 2014, the three broad categories are mutually exclusive and the economies in transition category cannot be put into either the developed or developing baskets. On the other hand, the World Bank uses classification in terms of levels of income, which include high income, middle income (lower, middle, and upper), and low income economies. Clearly allowing more categories can allow for closer fits between country and category