Testing Python Data Science Packages: Practical Examples with NumPy, Pandas, and Scikit-learn

After successfully installing NumPy, Pandas, and Scikit-learn, it's crucial to verify that everything works correctly. This hands-on guide demonstrates how to test your installation by creating practical examples with each package, complete with explanations of the code and output.

💡

🎯 What You'll Learn: In this practical tutorial, you'll discover:

How to create and test Python scripts for each data science package
Understanding NumPy arrays and their basic operations
Working with Pandas DataFrames for data manipulation
Loading and exploring datasets with Scikit-learn
Interpreting output and verifying successful installations
Best practices for organizing test scripts

🛠️ Setting Up Test Environment

Before we start testing our packages, let's understand the workflow for creating and running test scripts on a Linux system.

Creating Test Files

We'll create separate test files for each package to isolate functionality and make debugging easier.

🔢 Testing NumPy: Array Operations

Let's start with NumPy, the foundation of numerical computing in Python.

Creating the NumPy Test Script

touch test_numpy.py

What this command does:

touch creates an empty file named test_numpy.py
If the file already exists, it updates the timestamp
This is the standard way to create new files in Linux

Editing the Script

nano test_numpy.py

After editing in nano (a terminal-based text editor), let's examine what we created:

cat test_numpy.py

Output:

import numpy as np

array = np.array([1,2,3,4,5])

print(f"NumPy array: {array}")

Understanding the NumPy Test Code

Let's break down each line of our NumPy test:

Line	Code	Purpose
1	`import numpy as np`	Imports NumPy library with alias 'np' (standard convention)
2	`array = np.array([1,2,3,4,5])`	Creates a NumPy array from a Python list
3	`print(f"NumPy array: {array}")`	Displays the array using f-string formatting

Running the NumPy Test

python test_numpy.py

Output:

NumPy array: [1 2 3 4 5]

What the output tells us:

Successful Import: NumPy imported without errors
Array Creation: The np.array() function worked correctly
Data Type: Notice no commas between numbers – this is NumPy's array representation
Memory Efficiency: NumPy arrays are more compact than Python lists

✅

✅ NumPy Test Success: The output confirms that NumPy is working correctly. The array displays as [1 2 3 4 5] without commas, which is NumPy's standard integer array format.

📊 Testing Pandas: DataFrame Operations

Now let's test Pandas for data manipulation capabilities.

Creating the Pandas Test Script

nano test_pandas.py

Let's examine the Pandas test script:

cat test_pandas.py

Output:

import pandas as pd

data = {'Name': ['Alice', 'Bob'], 'Age': [24, 30]}

df = pd.DataFrame(data)

print(f"Pandas DataFrame:\n{df}")

Understanding the Pandas Test Code

Line	Code	Explanation
1	`import pandas as pd`	Imports Pandas with standard alias 'pd'
2	`data = {'Name': ['Alice', 'Bob'], 'Age': [24, 30]}`	Creates a Python dictionary with sample data
3	`df = pd.DataFrame(data)`	Converts dictionary to Pandas DataFrame
4	`print(f"Pandas DataFrame:\n{df}")`	Displays the DataFrame with newline formatting

Running the Pandas Test

python test_pandas.py

Output:

Pandas DataFrame:
    Name  Age
0  Alice   24
1    Bob   30

Understanding the Pandas Output

Let's analyze what this DataFrame output means:

Structure Analysis:

    Name  Age     ← Column headers
0  Alice   24     ← Row 0 (index 0)
1    Bob   30     ← Row 1 (index 1)

Key Elements:

Element	Description	Value
Index	Row identifiers (leftmost column)	0, 1
Columns	Data categories	Name, Age
Data Types	Automatically inferred	String (Name), Integer (Age)
Alignment	Automatic column alignment	Properly formatted table

💡

💡 DataFrame Features: Notice how Pandas automatically formatted the data into a table structure with proper alignment and automatically assigned row indices (0, 1). This is one of Pandas' key strengths for data analysis.

🤖 Testing Scikit-learn: Dataset Loading

Finally, let's test Scikit-learn with a real dataset.

Creating the Scikit-learn Test Script

nano test_sklearn.py

Let's examine the Scikit-learn test script:

cat test_sklearn.py

Output:

from sklearn.datasets import load_iris

iris = load_iris()

print("Iris dataset, target names:", iris.target_names)

Understanding the Scikit-learn Test Code

Line	Code	Purpose
1	`from sklearn.datasets import load_iris`	Imports the Iris dataset loader function
2	`iris = load_iris()`	Loads the complete Iris dataset into memory
3	`print("Iris dataset, target names:", iris.target_names)`	Displays the classification categories

Running the Scikit-learn Test

python test_sklearn.py

Output:

Iris dataset, target names: ['setosa' 'versicolor' 'virginica']

Understanding the Scikit-learn Output

What this output reveals:

Successful Import: Scikit-learn and its datasets module loaded correctly
Dataset Access: The load_iris() function worked and returned a dataset object
Classification Labels: The Iris dataset contains three species of iris flowers
Array Format: Output shows NumPy array format (no commas) containing three string labels

About the Iris Dataset:

Classic Dataset: One of the most famous datasets in machine learning
150 Samples: 50 samples of each of the three species
4 Features: Sepal length, sepal width, petal length, petal width
3 Classes: setosa, versicolor, virginica (iris species)

✅

✅ Complete Success: All three packages are working perfectly! You now have a fully functional Python data science environment ready for serious work.

📁 File Management and Organization

Let's look at how our test files are organized in the system:

Listing Our Test Files

After creating all test scripts, our directory contains:

ls -la *.py

Expected output (similar to):

-rw-rw-r--. 1 centos9 centos9 87 test_numpy.py
-rw-rw-r--. 1 centos9 centos9 134 test_pandas.py
-rw-rw-r--. 1 centos9 centos9 89 test_sklearn.py

File Details:

File	Size (bytes)	Purpose	Complexity
test_numpy.py	87	Basic array creation and display	Simple
test_pandas.py	134	DataFrame creation from dictionary	Moderate
test_sklearn.py	89	Dataset loading and inspection	Simple

🧪 Advanced Testing Ideas

Now that basic functionality is confirmed, here are some additional tests you can try:

Enhanced NumPy Test

import numpy as np

# Test array operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print(f"Array 1: {arr1}")
print(f"Array 2: {arr2}")
print(f"Addition: {arr1 + arr2}")
print(f"Element-wise multiplication: {arr1 * arr2}")
print(f"Array shape: {arr1.shape}")
print(f"Array data type: {arr1.dtype}")

Enhanced Pandas Test

import pandas as pd

# Test DataFrame operations
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [24, 30, 35, 28],
    'City': ['New York', 'London', 'Tokyo', 'Paris']
}

df = pd.DataFrame(data)

print("Complete DataFrame:")
print(df)
print(f"\nDataFrame shape: {df.shape}")
print(f"\nColumn names: {list(df.columns)}")
print(f"\nAge statistics:\n{df['Age'].describe()}")

Enhanced Scikit-learn Test

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset and explore structure
iris = load_iris()

print(f"Dataset shape: {iris.data.shape}")
print(f"Number of features: {len(iris.feature_names)}")
print(f"Feature names: {iris.feature_names}")
print(f"Target names: {iris.target_names}")

# Test train_test_split functionality
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

print(f"\nTraining set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

🔍 Troubleshooting Common Issues

Import Errors

If you encounter import errors, check:

Package Installation: Verify packages are installed in the correct environment
Python Path: Ensure Python can find the installed packages
Virtual Environment: Check if you're in the correct virtual environment

Version Compatibility

# Check versions
import numpy as np
import pandas as pd
import sklearn

print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Scikit-learn version: {sklearn.__version__}")

Performance Testing

import time
import numpy as np

# Test NumPy performance
start_time = time.time()
large_array = np.random.random((1000, 1000))
result = np.sum(large_array)
end_time = time.time()

print(f"Large array sum: {result:.4f}")
print(f"Computation time: {end_time - start_time:.4f} seconds")

🎯 Key Takeaways

✅ Remember These Points

Test Each Package Separately: Isolate functionality to identify specific issues
Understand Output Formats: Each package has distinct output representations
Use Standard Aliases: np, pd, and sklearn are universally recognized
Start Simple: Basic tests confirm installation before complex operations
Incremental Complexity: Build from simple imports to advanced functionality

📚 Learning Resources

NumPy Documentation: numpy.org
Pandas Tutorials: pandas.pydata.org
Scikit-learn Guide: scikit-learn.org
Jupyter Notebooks: Consider installing Jupyter for interactive development

✅

🎉 Testing Complete! You've successfully verified that NumPy, Pandas, and Scikit-learn are working correctly on your system. Your Python data science environment is now ready for real-world projects and advanced learning.

Ready for your first data science project? You now have all the tools needed to start analyzing data, building models, and creating insights!

💬 Discussion

How did your testing experience go?

Which package output surprised you the most?
Have you tried the enhanced testing examples?
What type of data science project interests you most?
Did you encounter any issues during testing?

Connect with me:

🐙 GitHub - Data science projects and examples
🐦 Twitter - Daily data science tips
📧 Contact - Questions about data science setup

This testing guide provides practical verification methods for your Python data science installation. Regular testing ensures your environment remains functional as you add new packages and update existing ones.

Testing Python Data Science Packages: Practical Examples with NumPy, Pandas, and Scikit-learn

🛠️ Setting Up Test Environment

Creating Test Files

🔢 Testing NumPy: Array Operations

Creating the NumPy Test Script

Editing the Script

Understanding the NumPy Test Code

Running the NumPy Test

📊 Testing Pandas: DataFrame Operations

Creating the Pandas Test Script

Understanding the Pandas Test Code

Running the Pandas Test

Understanding the Pandas Output

🤖 Testing Scikit-learn: Dataset Loading

Creating the Scikit-learn Test Script

Understanding the Scikit-learn Test Code

Running the Scikit-learn Test

Understanding the Scikit-learn Output

📁 File Management and Organization

Listing Our Test Files

🧪 Advanced Testing Ideas

Enhanced NumPy Test

Enhanced Pandas Test

Enhanced Scikit-learn Test

🔍 Troubleshooting Common Issues

Import Errors

Version Compatibility

Performance Testing

🎯 Key Takeaways

✅ Remember These Points

📚 Learning Resources

💬 Discussion

Written by Owais

Related Articles

Installing Python Data Science Packages on CentOS: Complete Step-by-Step Guide

Pandas Data Analysis for Absolute Beginners: Reading CSV Files, Data Exploration & Missing Values

NumPy Fundamentals for Absolute Beginners: Arrays, Slicing, and Mathematical Operations with Terminal Examples

More Reading

Matplotlib Data Visualization for Absolute Beginners: Installation, Line Plots, Bar Charts & Histograms