Path: blob/main/Trabajo_grupal/WG1/qs_world_fs.ipynb
2714 views
Kernel: Python 3
Visualizing QS World University Rankings from 2017 to 2022
QS World University Rankings is an annual publication of global university rankings by Quacquarelli Symonds. The QS ranking receives approval from the International Ranking Expert Group (IREG), and is viewed as one of the three most-widely read university rankings in the world, along with Academic Ranking of World Universities and Times Higher Education World University Rankings. Quacquarelli Symonds (QS) is a UK company specialising in the analysis of higher education institutions around the world. In December 2003, Richard Lambert's review of university-industry collaboration in BritainΒ forΒ HM Treasury, the finance ministry of the United Kingdom recommended the need for world university rankings which Lambert said would help the UK to gauge the global standing of its universities. So, the first issue of QS World Rankings was released in 2004 in partnership with Times Higher Education (THE) as Times Higher Education - QS World University Rankings. In 2009, THE split with QS and went ahead to publish its own version of rankings. QS has been publishing its university rankings in partnership with Elsevier.
Methodology
QS designed its rankings to assess performance according to what it believes to be key aspects of a university's mission: teaching, research, nurturing employability, and internationalisation. The methodological framework it follows assess universities based on six metrics,
- Academic Reputation (40%)
- Employer Reputation (10%)
- Faculty/Student Ratio (20%)
- Citations per faculty (20%)
- International Faculty Ratio (5%)
- International Student Ratio (5%)
About Data ππ
The dataset was obtained by scraping the QS World University Rankings website with Python and Selenium.
Feature Description
The dataset has a total of 15 columns.
- university - name of the university
- year - year of ranking
- rank_display - rank given to the university
- score - score of the university based on the six key metrics mentioned above
- link - link to the university profile page on QS website
- country - country in which the university is located
- city - city in which the university is located
- region - continent in which the university is located
- logo - link to the logo of the university
- type - type of university (public or private)
- research_output - quality of research at the university
- student_faculty_ratio - number of students assigned to per faculty
- international_students - number of international students enrolled at the university
- size - size of the university in terms of area
- faculty_count - number of faculty or academic staff at the university
In [ ]:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting geopandas
Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
|ββββββββββββββββββββββββββββββββ| 1.0 MB 13.5 MB/s
Collecting pyproj>=2.2.0
Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
|ββββββββββββββββββββββββββββββββ| 6.3 MB 37.6 MB/s
Collecting fiona>=1.8
Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
|ββββββββββββββββββββββββββββββββ| 16.7 MB 63.2 MB/s
Requirement already satisfied: pandas>=0.25.0 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.3.5)
Requirement already satisfied: shapely>=1.6 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.8.2)
Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (1.15.0)
Collecting munch
Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting click-plugins>=1.0
Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (22.1.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (57.4.0)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (7.1.2)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (2022.6.15)
Collecting cligj>=0.5
Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2022.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2.8.2)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (1.21.6)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1
Import Libraries π
In [ ]:
Custom Color Palette π¨
In [ ]:
In [ ]:
Load and Explore data π΅π»ββοΈ
In [ ]:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-6-446ab81f94af> in <module>
----> 1 university_df = pd.read_excel("/content/drive/MyDrive/DAE-PUCP/Docentes - policy paper/Base de datos/data2017_2022.xlsx")
/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
362 if not isinstance(io, ExcelFile):
363 should_close = True
--> 364 io = ExcelFile(io, storage_options=storage_options, engine=engine)
365 elif engine and engine != io.engine:
366 raise ValueError(
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in __init__(self, path_or_buffer, engine, storage_options)
1190 else:
1191 ext = inspect_excel_format(
-> 1192 content_or_path=path_or_buffer, storage_options=storage_options
1193 )
1194 if ext is None:
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in inspect_excel_format(content_or_path, storage_options)
1069
1070 with get_handle(
-> 1071 content_or_path, "rb", storage_options=storage_options, is_text=False
1072 ) as handle:
1073 stream = handle.handle
/usr/local/lib/python3.7/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
709 else:
710 # Binary mode
--> 711 handle = open(handle, ioargs.mode)
712 handles.append(handle)
713
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/DAE-PUCP/Docentes - policy paper/Base de datos/data2017_2022.xlsx'
In [ ]:
In [ ]:
In [ ]:
Data Cleaning and Preprocessing π§Ήπ¨
We can see from the dataset info() method that there are many null values across multiple columns. Let's take a look at the number of null values.
In [ ]:
In [ ]:
Before handling the null values, let's see if there is any correlation between the missing values. I have used the missingno package. It's a simple python package that can be used for missing data visualization. Visualizing correlation between missing values can give better insights about the missingness of data. Learn more about missingness types here.
In [ ]:
The correlation heatmap includes only the columns with missing values. The higher the correlation, the higher the missing values in one column are dependent on the missing values with another column. We can see 'faculty_count' has significant correlation with 'student_faculty_ratio' and 'international_students'. Other columns have little or no significant correlation.
Since multiple columns have missing values, let's drop rows that have more than 4 missing values because we can't work with a university that's missing a lot of its attributes.
In [ ]:
Let's drop 'link' and 'logo' column as they are hyperlinks. Although 'score' column can be very useful for analysis, its missing nearly 56% values. When I looked for these values on the QS website, I could see they have given a score only for the top 500 universities although 1000+ universities have been ranked. So, I'm ignoring this column as well.
In [ ]:
Converting the 'international_students', 'faculty_count' and 'rank_display' column to numerical by removing all the special characters in them.
In [ ]:
Visualizing universities by year and type
In [ ]:
With each year, more and more universities are considered for the rankings and 2022 has the highest number of universities.
In [ ]:
If you do a simple google search, you can find many websites claiming that private universities are better than public universities because they tend to have better rankings. Well, that's not the case here. More than 80% of the universities ranked are public.
Distribution of universities across the world π
Now, let's take a look at the geography of the universities.
Universities by Continents πΊ
In [ ]:
Europe tends to be the continent with more number of universities though we have to consider the fact that they have included Russia in Europe although it belongs to both Europe and Asia. It is followed by Asia and North America.
Universities by Countries π«
In [ ]:
Out of the 195 countries in the world, only 97 countries have universities that are ranked.
In [ ]:
United States consists of more number of universities that have been ranked over the years followed by United Kingdom and Germany.
Universities by Cities π
In [ ]:
In [ ]:
The above graph considers the top 20 cities with high number of unique universities. London is an academic hotspot with a whooping 19 universities that are ranked globally!
Ranking of Top 10 Universities π
In [ ]:
In [ ]:
By taking a quick look at the dataframe, I have made a list of the top 10 universities ranked over the years. These 10 universities have a tendency to occupy the top 10 positions consistently.
- MIT tends to be undisputed king in terms of QS Rankings, ranked number 1 always.
- Stanford and Harvard have dropped down this year for the first time since 2017.
- University of Oxford, the oldest university in the English-speaking world, has jumped from Rank 6 to Rank 2.
- On an overall scale, universities from UK have spiked up on their rankings compared to the US universities most of which have dropped down this year (2022).
- Out of the top 10, Only one university, ETH Zurich (Switzerland), is from a country other than US or UK.
QS World Rankings - Contributing Factors βοΈ
Let's explore the metrics used to gauge the universities. Out of the 6 metrics that have been used, only 3 are present in this dataset.
- Research output - 20%
- Student Faculty ratio - 20%
- International students - 5%
Research Output π¬
Next to teaching, Academic research is viewed as a very important factor. Understanding research output can give us insights about how the top universities prioritize them.
In [ ]:
Clearly, most number of universities under consideration have "Very High" research output. Public universities outperform private universities in terms of research.
In [ ]:
As far as the number of faculty are concerned,
- Universities with "Very High" research output have more staffs. So, does this mean universities with higher number of faculty do better research? Not necessarily. Universities with "Very High" research output may attract more accomplished academics and researchers because of their reputation along with many other factors.
- We can see public universities with research output as "Very High" and "Low" have nearly equal number of staffs.
- Also, private universities with "Low" research output have more staffs than "High" output. This can mean that, not every university puts an emphasis on research although they have more number of academic staff.
In [ ]:
In [ ]:
In [ ]:
π‘ A quick intro on how to interpret pointplot.
A pointplot shows an estimate of mean value for a numeric variable by using scatter plot points. This can be particularly useful for comparing different levels of a categorical variable. The lines joining the pointplot can be used to judge the differences between slopes easily. For more info, refer here
In [ ]:
The relationship between the size of the university and research output is pretty candid. Universities with "Very High" and "High" research output are larger in size comapare to "Medium" and "Low".
Student Faculty Ratio π©π»βπβπ§π»βπ«
Student Faculty Ratio is an interesting measure. According to QS, "It is usually cited by students as a metric of highest importance to them". Lesser the ratio, higher the performance. A faculty with less number of students assigned to them can dedicate more focus and attention on each individual.
In [ ]:
- On average, universities tend to have 13 students per faculty.
- There are universities that have as low as 1 student per faculty.
- While there are universities that have 67 students per faculty.
In [ ]:
We have a right skewed distribution. The outliers doesn't seem to affect the mean much. Most of the universities have somewhere between 5 to 20 students per faculty.
In [ ]:
Obviously, universities with "Very High" research output have very less "student faculty ratio" compared to the rest of them.
In [ ]:
Private universities have very less "student faculty ratio" compared to the public universities when it comes to the size. Another interesting observation is that the average "student faculty ratio" seems to increase with increase in the "size" of the university.
International Students π
A university that attracts students from across the world demonstrates a global outlook and possess a multicultural diversity in its campus.
In [ ]:
- On average, universities tend to have 1900+ international students.
- There is a university with its international students intake as high as 31,000+. Let's take a look at it.
In [ ]:
In [ ]:
We have a right skewed distribution here as well. There are very few outliers. Most of the universities have an intake between 0 to 5000.
In [ ]:
International students tend to prefer public universities with "Very High" research output. Due to lesser tuition fees compared to private ones? π€ Maybe. π€·π»ββοΈ
Most popular country of choice for International Students β
And for the last part, which country is most popular among international students? Can you guess before going down?
In [ ]:
In [ ]:
It's USA πΊπΈ closely followed by UK π¬π§!
If you've come down this far, "THANK YOU!". Let me know in the comments if you have any feedback, criticisms or concerns.
Credits and Acknowledgement
Did I say those words? If you like my work, do hit the upvote button! No, I didn't. π