Setting up a Python data science environment can be daunting for beginners, especially when dealing with package dependencies and system-specific installations. This comprehensive guide walks you through installing the three most essential Python data science packages on CentOS: NumPy, Pandas, and Scikit-learn.
π― What You'll Learn: In this practical tutorial, you'll discover:
- How to check your Python version and install pip on CentOS
- Step-by-step installation of NumPy, Pandas, and Scikit-learn
- Understanding package dependencies and installation output
- Troubleshooting common installation issues
- Verifying successful package installations
π Step 1: Checking Your Python Environment
Before installing any packages, it's crucial to verify your Python installation and version.
Checking Python Version
python --version
Output:
Python 3.9.23
What This Tells Us:
- Python 3.9.23 is installed and accessible
- This version is compatible with all modern data science packages
- The
python
command points to Python 3 (good for modern systems)
β Version Compatibility: Python 3.9.23 is an excellent version for data science work, supporting all the latest features and packages we'll install.
π¦ Step 2: Installing pip (Python Package Manager)
Most CentOS systems don't come with pip pre-installed. Let's install it first.
Initial pip Installation Attempt
pip install numpy
Output:
bash: pip: command not found...
Install package 'python3-pip' to provide command 'pip'? [N/y] y
What Happened:
- The system detected that
pip
is not installed - CentOS helpfully suggested installing
python3-pip
- We accepted the installation prompt by typing
y
System Package Installation Process
After accepting the pip installation, the system goes through several phases:
* Waiting in queue...
* Loading list of packages....
The following packages have to be installed:
python3-pip-21.3.1-1.el9.noarch A tool for installing and managing Python3 packages
Proceed with changes? [N/y] y
Installation Phases:
Phase | Description | What's Happening |
---|---|---|
Waiting in queue | Package manager queue | System is queuing the installation request |
Waiting for authentication | User permissions | Checking if user has sudo privileges |
Downloading packages | Package retrieval | Downloading python3-pip from repositories |
Testing changes | Validation | Verifying package integrity and dependencies |
Installing packages | Final installation | Actually installing pip to the system |
π’ Step 3: Installing NumPy
With pip now available, the original NumPy installation command automatically continues:
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy
Downloading numpy-2.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
|ββββββββββββββββββββββββββββββββ| 19.5 MB 3.4 MB/s
Installing collected packages: numpy
Successfully installed numpy-2.0.2
Understanding the NumPy Installation Output
Key Information:
-
User Installation Notice:
Defaulting to user installation because normal site-packages is not writeable
- Packages install to user directory (
~/.local/lib/python3.9/site-packages
) - No admin privileges required for user-specific installations
- Safer than system-wide installations
- Packages install to user directory (
-
Package Download Details:
- Version: numpy-2.0.2
- Python Compatibility: cp39 (CPython 3.9)
- Architecture: x86_64 (64-bit Linux)
- File Size: 19.5 MB
- Download Speed: 3.4 MB/s
-
File Format Explanation:
.whl
= Wheel format (pre-compiled Python package)manylinux
= Compatible with many Linux distributions- Faster installation than compiling from source
π‘ What is NumPy? NumPy (Numerical Python) is the fundamental package for scientific computing with Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.
π Step 4: Installing Pandas
Next, we install Pandas, which builds upon NumPy:
pip install pandas
Output:
Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
Downloading pandas-2.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
|ββββββββββββββββββββββββββββββββ| 12.4 MB 6.1 MB/s
Collecting tzdata>=2022.7
Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
|ββββββββββββββββββββββββββββββββ| 347 kB 38.7 MB/s
Requirement already satisfied: numpy>=1.22.4 in /home/centos9/.local/lib/python3.9/site-packages (from pandas) (2.0.2)
Collecting python-dateutil>=2.8.2
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
|ββββββββββββββββββββββββββββββββ| 229 kB 21.1 MB/s
Collecting pytz>=2020.1
Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
|ββββββββββββββββββββββββββββββββ| 509 kB 38.3 MB/s
Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas) (1.15.0)
Installing collected packages: tzdata, pytz, python-dateutil, pandas
Successfully installed pandas-2.3.2 python-dateutil-2.9.0.post0 pytz-2025.2 tzdata-2025.2
Understanding Pandas Installation and Dependencies
Main Package:
- pandas-2.3.2: The core data manipulation library (12.4 MB)
- Download Speed: 6.1 MB/s (faster than NumPy due to better network conditions)
Dependencies Installed:
Package | Version | Purpose | Size |
---|---|---|---|
tzdata | 2025.2 | Timezone database | 347 kB |
python-dateutil | 2.9.0.post0 | Date parsing utilities | 229 kB |
pytz | 2025.2 | Timezone calculations | 509 kB |
Already Satisfied Dependencies:
- numpy>=1.22.4: Previously installed (2.0.2 satisfies requirement)
- six>=1.5: System package already available
β Dependency Resolution: Notice how pip automatically detected that NumPy was already installed and satisfied the version requirement. This is pip's intelligent dependency management at work.
π€ Step 5: Installing Scikit-learn
Finally, we install Scikit-learn for machine learning capabilities:
pip install scikit-learn
Output:
Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
Downloading scikit_learn-1.6.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
|ββββββββββββββββββββββββββββββββ| 13.5 MB 19.4 MB/s
Collecting scipy>=1.6.0
Downloading scipy-1.13.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB)
|ββββββββββββββββββββββββββββββββ| 38.6 MB 350 kB/s
Collecting threadpoolctl>=3.1.0
Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Requirement already satisfied: numpy>=1.19.5 in /home/centos9/.local/lib/python3.9/site-packages (from scikit-learn) (2.0.2)
Collecting joblib>=1.2.0
Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
|ββββββββββββββββββββββββββββββββ| 308 kB 13.3 MB/s
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn
Successfully installed joblib-1.5.2 scikit-learn-1.6.1 scipy-1.13.1 threadpoolctl-3.6.0
Understanding Scikit-learn Installation
Main Package:
- scikit-learn-1.6.1: Machine learning library (13.5 MB)
- Excellent Download Speed: 19.4 MB/s
Major Dependencies:
Package | Version | Purpose | Size |
---|---|---|---|
scipy | 1.13.1 | Scientific computing algorithms | 38.6 MB |
joblib | 1.5.2 | Parallel computing utilities | 308 kB |
threadpoolctl | 3.6.0 | Thread pool control | 18 kB |
Notable Observations:
- SciPy is the largest package (38.6 MB) due to compiled mathematical algorithms
- Download speed variation: SciPy downloaded at only 350 kB/s (network congestion or server load)
- NumPy dependency satisfied: Our previously installed NumPy 2.0.2 meets the >=1.19.5 requirement
β οΈ Download Speed Variations: Notice how download speeds varied significantly between packages. This is normal and depends on server load, network conditions, and package repository locations.
π Installation Summary
Let's summarize what we've accomplished:
Packages Successfully Installed
Category | Package | Version | Primary Use |
---|---|---|---|
Core Arrays | numpy | 2.0.2 | Numerical computations and arrays |
Data Analysis | pandas | 2.3.2 | Data manipulation and analysis |
Machine Learning | scikit-learn | 1.6.1 | Machine learning algorithms |
Scientific Computing | scipy | 1.13.1 | Advanced mathematical functions |
Utilities | Supporting packages | Various | Date handling, parallel processing, etc. |
Total Resources Used
- Total Download Size: ~85 MB across all packages
- Installation Location:
~/.local/lib/python3.9/site-packages/
- Installation Type: User-level (no admin privileges required)
π― Key Takeaways
β Remember These Points
- Check Python Version First: Always verify your Python installation before installing packages
- User vs System Installation: User installations are safer and don't require admin privileges
- Dependency Management: pip automatically resolves and installs package dependencies
- Download Variations: Package download speeds can vary significantly due to network conditions
- Version Compatibility: Modern packages work well together when using recent Python versions
π What's Next?
Now that you have the essential data science packages installed, you're ready to start coding! In our next post, we'll explore:
- Creating and testing simple scripts with these packages
- Understanding basic NumPy array operations
- Working with Pandas DataFrames
- Loading datasets with Scikit-learn
- Practical examples and code demonstrations
The foundation is set β let's start building amazing data science projects!
π§ Troubleshooting Tips
Common Issues and Solutions:
- Permission Denied: Use
--user
flag with pip for user installations - Command Not Found: Ensure pip is installed and accessible in your PATH
- Version Conflicts: Use virtual environments to isolate package versions
- Slow Downloads: Try different times of day or use pip with
--index-url
flag
π Congratulations! You've successfully set up a complete Python data science environment on CentOS. Your system is now equipped with NumPy, Pandas, and Scikit-learn β the essential trinity of Python data science packages.
Ready to start coding? Check out the next post in this series where we'll test these installations with practical examples!
π¬ Discussion
Have you encountered any issues during your Python data science setup?
- Which package took the longest to install on your system?
- Have you tried installing these packages on other Linux distributions?
- What data science projects are you planning to work on?
- Did you prefer user installation or would you use virtual environments?
Connect with me:
- π GitHub - Data science examples and scripts
- π¦ Twitter - Quick tips and updates
- π§ Contact - Data science discussions and questions
This installation guide covers the fundamental setup process for Python data science packages. As package versions evolve, some details may change, but the core installation principles remain constant.