Installing Python Data Science Packages on CentOS: Complete Step-by-Step Guide

Learn how to install essential Python data science packages including NumPy, Pandas, and Scikit-learn on CentOS. Complete with troubleshooting tips and package explanations.

10 min read

Setting up a Python data science environment can be daunting for beginners, especially when dealing with package dependencies and system-specific installations. This comprehensive guide walks you through installing the three most essential Python data science packages on CentOS: NumPy, Pandas, and Scikit-learn.

πŸ’‘

🎯 What You'll Learn: In this practical tutorial, you'll discover:

  • How to check your Python version and install pip on CentOS
  • Step-by-step installation of NumPy, Pandas, and Scikit-learn
  • Understanding package dependencies and installation output
  • Troubleshooting common installation issues
  • Verifying successful package installations

🐍 Step 1: Checking Your Python Environment

Before installing any packages, it's crucial to verify your Python installation and version.

Checking Python Version

python --version

Output:

Python 3.9.23

What This Tells Us:

  • Python 3.9.23 is installed and accessible
  • This version is compatible with all modern data science packages
  • The python command points to Python 3 (good for modern systems)
βœ…

βœ… Version Compatibility: Python 3.9.23 is an excellent version for data science work, supporting all the latest features and packages we'll install.

πŸ“¦ Step 2: Installing pip (Python Package Manager)

Most CentOS systems don't come with pip pre-installed. Let's install it first.

Initial pip Installation Attempt

pip install numpy

Output:

bash: pip: command not found...
Install package 'python3-pip' to provide command 'pip'? [N/y] y

What Happened:

  • The system detected that pip is not installed
  • CentOS helpfully suggested installing python3-pip
  • We accepted the installation prompt by typing y

System Package Installation Process

After accepting the pip installation, the system goes through several phases:

 * Waiting in queue...
 * Loading list of packages....
The following packages have to be installed:
 python3-pip-21.3.1-1.el9.noarch	A tool for installing and managing Python3 packages
Proceed with changes? [N/y] y

Installation Phases:

PhaseDescriptionWhat's Happening
Waiting in queuePackage manager queueSystem is queuing the installation request
Waiting for authenticationUser permissionsChecking if user has sudo privileges
Downloading packagesPackage retrievalDownloading python3-pip from repositories
Testing changesValidationVerifying package integrity and dependencies
Installing packagesFinal installationActually installing pip to the system

πŸ”’ Step 3: Installing NumPy

With pip now available, the original NumPy installation command automatically continues:

Defaulting to user installation because normal site-packages is not writeable
Collecting numpy
  Downloading numpy-2.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 19.5 MB 3.4 MB/s
Installing collected packages: numpy
Successfully installed numpy-2.0.2

Understanding the NumPy Installation Output

Key Information:

  1. User Installation Notice:

    Defaulting to user installation because normal site-packages is not writeable
    
    • Packages install to user directory (~/.local/lib/python3.9/site-packages)
    • No admin privileges required for user-specific installations
    • Safer than system-wide installations
  2. Package Download Details:

    • Version: numpy-2.0.2
    • Python Compatibility: cp39 (CPython 3.9)
    • Architecture: x86_64 (64-bit Linux)
    • File Size: 19.5 MB
    • Download Speed: 3.4 MB/s
  3. File Format Explanation:

    • .whl = Wheel format (pre-compiled Python package)
    • manylinux = Compatible with many Linux distributions
    • Faster installation than compiling from source
πŸ’‘

πŸ’‘ What is NumPy? NumPy (Numerical Python) is the fundamental package for scientific computing with Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.

πŸ“Š Step 4: Installing Pandas

Next, we install Pandas, which builds upon NumPy:

pip install pandas

Output:

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-2.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12.4 MB 6.1 MB/s
Collecting tzdata>=2022.7
  Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 347 kB 38.7 MB/s
Requirement already satisfied: numpy>=1.22.4 in /home/centos9/.local/lib/python3.9/site-packages (from pandas) (2.0.2)
Collecting python-dateutil>=2.8.2
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 229 kB 21.1 MB/s
Collecting pytz>=2020.1
  Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 509 kB 38.3 MB/s
Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas) (1.15.0)
Installing collected packages: tzdata, pytz, python-dateutil, pandas
Successfully installed pandas-2.3.2 python-dateutil-2.9.0.post0 pytz-2025.2 tzdata-2025.2

Understanding Pandas Installation and Dependencies

Main Package:

  • pandas-2.3.2: The core data manipulation library (12.4 MB)
  • Download Speed: 6.1 MB/s (faster than NumPy due to better network conditions)

Dependencies Installed:

PackageVersionPurposeSize
tzdata2025.2Timezone database347 kB
python-dateutil2.9.0.post0Date parsing utilities229 kB
pytz2025.2Timezone calculations509 kB

Already Satisfied Dependencies:

  • numpy>=1.22.4: Previously installed (2.0.2 satisfies requirement)
  • six>=1.5: System package already available
βœ…

βœ… Dependency Resolution: Notice how pip automatically detected that NumPy was already installed and satisfied the version requirement. This is pip's intelligent dependency management at work.

πŸ€– Step 5: Installing Scikit-learn

Finally, we install Scikit-learn for machine learning capabilities:

pip install scikit-learn

Output:

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13.5 MB 19.4 MB/s
Collecting scipy>=1.6.0
  Downloading scipy-1.13.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 38.6 MB 350 kB/s
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Requirement already satisfied: numpy>=1.19.5 in /home/centos9/.local/lib/python3.9/site-packages (from scikit-learn) (2.0.2)
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 308 kB 13.3 MB/s
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn
Successfully installed joblib-1.5.2 scikit-learn-1.6.1 scipy-1.13.1 threadpoolctl-3.6.0

Understanding Scikit-learn Installation

Main Package:

  • scikit-learn-1.6.1: Machine learning library (13.5 MB)
  • Excellent Download Speed: 19.4 MB/s

Major Dependencies:

PackageVersionPurposeSize
scipy1.13.1Scientific computing algorithms38.6 MB
joblib1.5.2Parallel computing utilities308 kB
threadpoolctl3.6.0Thread pool control18 kB

Notable Observations:

  • SciPy is the largest package (38.6 MB) due to compiled mathematical algorithms
  • Download speed variation: SciPy downloaded at only 350 kB/s (network congestion or server load)
  • NumPy dependency satisfied: Our previously installed NumPy 2.0.2 meets the >=1.19.5 requirement
⚠️

⚠️ Download Speed Variations: Notice how download speeds varied significantly between packages. This is normal and depends on server load, network conditions, and package repository locations.

πŸ“‹ Installation Summary

Let's summarize what we've accomplished:

Packages Successfully Installed

CategoryPackageVersionPrimary Use
Core Arraysnumpy2.0.2Numerical computations and arrays
Data Analysispandas2.3.2Data manipulation and analysis
Machine Learningscikit-learn1.6.1Machine learning algorithms
Scientific Computingscipy1.13.1Advanced mathematical functions
UtilitiesSupporting packagesVariousDate handling, parallel processing, etc.

Total Resources Used

  • Total Download Size: ~85 MB across all packages
  • Installation Location: ~/.local/lib/python3.9/site-packages/
  • Installation Type: User-level (no admin privileges required)

🎯 Key Takeaways

βœ… Remember These Points

  1. Check Python Version First: Always verify your Python installation before installing packages
  2. User vs System Installation: User installations are safer and don't require admin privileges
  3. Dependency Management: pip automatically resolves and installs package dependencies
  4. Download Variations: Package download speeds can vary significantly due to network conditions
  5. Version Compatibility: Modern packages work well together when using recent Python versions

πŸš€ What's Next?

Now that you have the essential data science packages installed, you're ready to start coding! In our next post, we'll explore:

  • Creating and testing simple scripts with these packages
  • Understanding basic NumPy array operations
  • Working with Pandas DataFrames
  • Loading datasets with Scikit-learn
  • Practical examples and code demonstrations

The foundation is set – let's start building amazing data science projects!

πŸ”§ Troubleshooting Tips

Common Issues and Solutions:

  • Permission Denied: Use --user flag with pip for user installations
  • Command Not Found: Ensure pip is installed and accessible in your PATH
  • Version Conflicts: Use virtual environments to isolate package versions
  • Slow Downloads: Try different times of day or use pip with --index-url flag

βœ…

πŸŽ‰ Congratulations! You've successfully set up a complete Python data science environment on CentOS. Your system is now equipped with NumPy, Pandas, and Scikit-learn – the essential trinity of Python data science packages.

Ready to start coding? Check out the next post in this series where we'll test these installations with practical examples!

πŸ’¬ Discussion

Have you encountered any issues during your Python data science setup?

  • Which package took the longest to install on your system?
  • Have you tried installing these packages on other Linux distributions?
  • What data science projects are you planning to work on?
  • Did you prefer user installation or would you use virtual environments?

Connect with me:

  • πŸ™ GitHub - Data science examples and scripts
  • 🐦 Twitter - Quick tips and updates
  • πŸ“§ Contact - Data science discussions and questions

This installation guide covers the fundamental setup process for Python data science packages. As package versions evolve, some details may change, but the core installation principles remain constant.

Owais

Written by Owais

I'm an AIOps Engineer with a passion for AI, Operating Systems, Cloud, and Securityβ€”sharing insights that matter in today's tech world.

I completed the UK's Eduqual Level 6 Diploma in AIOps from Al Nafi International College, a globally recognized program that's changing careers worldwide. This diploma is:

  • βœ… Available online in 17+ languages
  • βœ… Includes free student visa guidance for Master's programs in Computer Science fields across the UK, USA, Canada, and more
  • βœ… Comes with job placement support and a 90-day success plan once you land a role
  • βœ… Offers a 1-year internship experience letter while you studyβ€”all with no hidden costs

It's not just a diplomaβ€”it's a career accelerator.

πŸ‘‰ Start your journey today with a 7-day free trial

Related Articles

Continue exploring with these handpicked articles that complement what you just read

More Reading

One more article you might find interesting