Testing Python Data Science Packages: Practical Examples with NumPy, Pandas, and Scikit-learn

Learn how to test and verify your Python data science installation with practical examples. Create your first NumPy arrays, Pandas DataFrames, and load Scikit-learn datasets.

11 min read

After successfully installing NumPy, Pandas, and Scikit-learn, it's crucial to verify that everything works correctly. This hands-on guide demonstrates how to test your installation by creating practical examples with each package, complete with explanations of the code and output.

๐Ÿ’ก

๐ŸŽฏ What You'll Learn: In this practical tutorial, you'll discover:

  • How to create and test Python scripts for each data science package
  • Understanding NumPy arrays and their basic operations
  • Working with Pandas DataFrames for data manipulation
  • Loading and exploring datasets with Scikit-learn
  • Interpreting output and verifying successful installations
  • Best practices for organizing test scripts

๐Ÿ› ๏ธ Setting Up Test Environment

Before we start testing our packages, let's understand the workflow for creating and running test scripts on a Linux system.

Creating Test Files

We'll create separate test files for each package to isolate functionality and make debugging easier.

๐Ÿ”ข Testing NumPy: Array Operations

Let's start with NumPy, the foundation of numerical computing in Python.

Creating the NumPy Test Script

touch test_numpy.py

What this command does:

  • touch creates an empty file named test_numpy.py
  • If the file already exists, it updates the timestamp
  • This is the standard way to create new files in Linux

Editing the Script

nano test_numpy.py

After editing in nano (a terminal-based text editor), let's examine what we created:

cat test_numpy.py

Output:

import numpy as np

array = np.array([1,2,3,4,5])

print(f"NumPy array: {array}")

Understanding the NumPy Test Code

Let's break down each line of our NumPy test:

LineCodePurpose
1import numpy as npImports NumPy library with alias 'np' (standard convention)
2array = np.array([1,2,3,4,5])Creates a NumPy array from a Python list
3print(f"NumPy array: {array}")Displays the array using f-string formatting

Running the NumPy Test

python test_numpy.py

Output:

NumPy array: [1 2 3 4 5]

What the output tells us:

  1. Successful Import: NumPy imported without errors
  2. Array Creation: The np.array() function worked correctly
  3. Data Type: Notice no commas between numbers โ€“ this is NumPy's array representation
  4. Memory Efficiency: NumPy arrays are more compact than Python lists
โœ…

โœ… NumPy Test Success: The output confirms that NumPy is working correctly. The array displays as [1 2 3 4 5] without commas, which is NumPy's standard integer array format.

๐Ÿ“Š Testing Pandas: DataFrame Operations

Now let's test Pandas for data manipulation capabilities.

Creating the Pandas Test Script

nano test_pandas.py

Let's examine the Pandas test script:

cat test_pandas.py

Output:

import pandas as pd

data = {'Name': ['Alice', 'Bob'], 'Age': [24, 30]}

df = pd.DataFrame(data)

print(f"Pandas DataFrame:\n{df}")

Understanding the Pandas Test Code

LineCodeExplanation
1import pandas as pdImports Pandas with standard alias 'pd'
2data = {'Name': ['Alice', 'Bob'], 'Age': [24, 30]}Creates a Python dictionary with sample data
3df = pd.DataFrame(data)Converts dictionary to Pandas DataFrame
4print(f"Pandas DataFrame:\n{df}")Displays the DataFrame with newline formatting

Running the Pandas Test

python test_pandas.py

Output:

Pandas DataFrame:
    Name  Age
0  Alice   24
1    Bob   30

Understanding the Pandas Output

Let's analyze what this DataFrame output means:

Structure Analysis:

    Name  Age     โ† Column headers
0  Alice   24     โ† Row 0 (index 0)
1    Bob   30     โ† Row 1 (index 1)

Key Elements:

ElementDescriptionValue
IndexRow identifiers (leftmost column)0, 1
ColumnsData categoriesName, Age
Data TypesAutomatically inferredString (Name), Integer (Age)
AlignmentAutomatic column alignmentProperly formatted table
๐Ÿ’ก

๐Ÿ’ก DataFrame Features: Notice how Pandas automatically formatted the data into a table structure with proper alignment and automatically assigned row indices (0, 1). This is one of Pandas' key strengths for data analysis.

๐Ÿค– Testing Scikit-learn: Dataset Loading

Finally, let's test Scikit-learn with a real dataset.

Creating the Scikit-learn Test Script

nano test_sklearn.py

Let's examine the Scikit-learn test script:

cat test_sklearn.py

Output:

from sklearn.datasets import load_iris

iris = load_iris()

print("Iris dataset, target names:", iris.target_names)

Understanding the Scikit-learn Test Code

LineCodePurpose
1from sklearn.datasets import load_irisImports the Iris dataset loader function
2iris = load_iris()Loads the complete Iris dataset into memory
3print("Iris dataset, target names:", iris.target_names)Displays the classification categories

Running the Scikit-learn Test

python test_sklearn.py

Output:

Iris dataset, target names: ['setosa' 'versicolor' 'virginica']

Understanding the Scikit-learn Output

What this output reveals:

  1. Successful Import: Scikit-learn and its datasets module loaded correctly
  2. Dataset Access: The load_iris() function worked and returned a dataset object
  3. Classification Labels: The Iris dataset contains three species of iris flowers
  4. Array Format: Output shows NumPy array format (no commas) containing three string labels

About the Iris Dataset:

  • Classic Dataset: One of the most famous datasets in machine learning
  • 150 Samples: 50 samples of each of the three species
  • 4 Features: Sepal length, sepal width, petal length, petal width
  • 3 Classes: setosa, versicolor, virginica (iris species)
โœ…

โœ… Complete Success: All three packages are working perfectly! You now have a fully functional Python data science environment ready for serious work.

๐Ÿ“ File Management and Organization

Let's look at how our test files are organized in the system:

Listing Our Test Files

After creating all test scripts, our directory contains:

ls -la *.py

Expected output (similar to):

-rw-rw-r--. 1 centos9 centos9 87 test_numpy.py
-rw-rw-r--. 1 centos9 centos9 134 test_pandas.py
-rw-rw-r--. 1 centos9 centos9 89 test_sklearn.py

File Details:

FileSize (bytes)PurposeComplexity
test_numpy.py87Basic array creation and displaySimple
test_pandas.py134DataFrame creation from dictionaryModerate
test_sklearn.py89Dataset loading and inspectionSimple

๐Ÿงช Advanced Testing Ideas

Now that basic functionality is confirmed, here are some additional tests you can try:

Enhanced NumPy Test

import numpy as np

# Test array operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print(f"Array 1: {arr1}")
print(f"Array 2: {arr2}")
print(f"Addition: {arr1 + arr2}")
print(f"Element-wise multiplication: {arr1 * arr2}")
print(f"Array shape: {arr1.shape}")
print(f"Array data type: {arr1.dtype}")

Enhanced Pandas Test

import pandas as pd

# Test DataFrame operations
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [24, 30, 35, 28],
    'City': ['New York', 'London', 'Tokyo', 'Paris']
}

df = pd.DataFrame(data)

print("Complete DataFrame:")
print(df)
print(f"\nDataFrame shape: {df.shape}")
print(f"\nColumn names: {list(df.columns)}")
print(f"\nAge statistics:\n{df['Age'].describe()}")

Enhanced Scikit-learn Test

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset and explore structure
iris = load_iris()

print(f"Dataset shape: {iris.data.shape}")
print(f"Number of features: {len(iris.feature_names)}")
print(f"Feature names: {iris.feature_names}")
print(f"Target names: {iris.target_names}")

# Test train_test_split functionality
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

print(f"\nTraining set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

๐Ÿ” Troubleshooting Common Issues

Import Errors

If you encounter import errors, check:

  1. Package Installation: Verify packages are installed in the correct environment
  2. Python Path: Ensure Python can find the installed packages
  3. Virtual Environment: Check if you're in the correct virtual environment

Version Compatibility

# Check versions
import numpy as np
import pandas as pd
import sklearn

print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Scikit-learn version: {sklearn.__version__}")

Performance Testing

import time
import numpy as np

# Test NumPy performance
start_time = time.time()
large_array = np.random.random((1000, 1000))
result = np.sum(large_array)
end_time = time.time()

print(f"Large array sum: {result:.4f}")
print(f"Computation time: {end_time - start_time:.4f} seconds")

๐ŸŽฏ Key Takeaways

โœ… Remember These Points

  1. Test Each Package Separately: Isolate functionality to identify specific issues
  2. Understand Output Formats: Each package has distinct output representations
  3. Use Standard Aliases: np, pd, and sklearn are universally recognized
  4. Start Simple: Basic tests confirm installation before complex operations
  5. Incremental Complexity: Build from simple imports to advanced functionality

๐Ÿ“š Learning Resources


โœ…

๐ŸŽ‰ Testing Complete! You've successfully verified that NumPy, Pandas, and Scikit-learn are working correctly on your system. Your Python data science environment is now ready for real-world projects and advanced learning.

Ready for your first data science project? You now have all the tools needed to start analyzing data, building models, and creating insights!

๐Ÿ’ฌ Discussion

How did your testing experience go?

  • Which package output surprised you the most?
  • Have you tried the enhanced testing examples?
  • What type of data science project interests you most?
  • Did you encounter any issues during testing?

Connect with me:

  • ๐Ÿ™ GitHub - Data science projects and examples
  • ๐Ÿฆ Twitter - Daily data science tips
  • ๐Ÿ“ง Contact - Questions about data science setup

This testing guide provides practical verification methods for your Python data science installation. Regular testing ensures your environment remains functional as you add new packages and update existing ones.

Owais

Written by Owais

I'm an AIOps Engineer with a passion for AI, Operating Systems, Cloud, and Securityโ€”sharing insights that matter in today's tech world.

I completed the UK's Eduqual Level 6 Diploma in AIOps from Al Nafi International College, a globally recognized program that's changing careers worldwide. This diploma is:

  • โœ… Available online in 17+ languages
  • โœ… Includes free student visa guidance for Master's programs in Computer Science fields across the UK, USA, Canada, and more
  • โœ… Comes with job placement support and a 90-day success plan once you land a role
  • โœ… Offers a 1-year internship experience letter while you studyโ€”all with no hidden costs

It's not just a diplomaโ€”it's a career accelerator.

๐Ÿ‘‰ Start your journey today with a 7-day free trial

Related Articles

Continue exploring with these handpicked articles that complement what you just read

More Reading

One more article you might find interesting