After successfully installing NumPy, Pandas, and Scikit-learn, it's crucial to verify that everything works correctly. This hands-on guide demonstrates how to test your installation by creating practical examples with each package, complete with explanations of the code and output.
๐ฏ What You'll Learn: In this practical tutorial, you'll discover:
- How to create and test Python scripts for each data science package
- Understanding NumPy arrays and their basic operations
- Working with Pandas DataFrames for data manipulation
- Loading and exploring datasets with Scikit-learn
- Interpreting output and verifying successful installations
- Best practices for organizing test scripts
๐ ๏ธ Setting Up Test Environment
Before we start testing our packages, let's understand the workflow for creating and running test scripts on a Linux system.
Creating Test Files
We'll create separate test files for each package to isolate functionality and make debugging easier.
๐ข Testing NumPy: Array Operations
Let's start with NumPy, the foundation of numerical computing in Python.
Creating the NumPy Test Script
touch test_numpy.py
What this command does:
touch
creates an empty file namedtest_numpy.py
- If the file already exists, it updates the timestamp
- This is the standard way to create new files in Linux
Editing the Script
nano test_numpy.py
After editing in nano (a terminal-based text editor), let's examine what we created:
cat test_numpy.py
Output:
import numpy as np
array = np.array([1,2,3,4,5])
print(f"NumPy array: {array}")
Understanding the NumPy Test Code
Let's break down each line of our NumPy test:
Line | Code | Purpose |
---|---|---|
1 | import numpy as np | Imports NumPy library with alias 'np' (standard convention) |
2 | array = np.array([1,2,3,4,5]) | Creates a NumPy array from a Python list |
3 | print(f"NumPy array: {array}") | Displays the array using f-string formatting |
Running the NumPy Test
python test_numpy.py
Output:
NumPy array: [1 2 3 4 5]
What the output tells us:
- Successful Import: NumPy imported without errors
- Array Creation: The
np.array()
function worked correctly - Data Type: Notice no commas between numbers โ this is NumPy's array representation
- Memory Efficiency: NumPy arrays are more compact than Python lists
โ
NumPy Test Success: The output confirms that NumPy is working correctly. The array displays as [1 2 3 4 5]
without commas, which is NumPy's standard integer array format.
๐ Testing Pandas: DataFrame Operations
Now let's test Pandas for data manipulation capabilities.
Creating the Pandas Test Script
nano test_pandas.py
Let's examine the Pandas test script:
cat test_pandas.py
Output:
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [24, 30]}
df = pd.DataFrame(data)
print(f"Pandas DataFrame:\n{df}")
Understanding the Pandas Test Code
Line | Code | Explanation |
---|---|---|
1 | import pandas as pd | Imports Pandas with standard alias 'pd' |
2 | data = {'Name': ['Alice', 'Bob'], 'Age': [24, 30]} | Creates a Python dictionary with sample data |
3 | df = pd.DataFrame(data) | Converts dictionary to Pandas DataFrame |
4 | print(f"Pandas DataFrame:\n{df}") | Displays the DataFrame with newline formatting |
Running the Pandas Test
python test_pandas.py
Output:
Pandas DataFrame:
Name Age
0 Alice 24
1 Bob 30
Understanding the Pandas Output
Let's analyze what this DataFrame output means:
Structure Analysis:
Name Age โ Column headers
0 Alice 24 โ Row 0 (index 0)
1 Bob 30 โ Row 1 (index 1)
Key Elements:
Element | Description | Value |
---|---|---|
Index | Row identifiers (leftmost column) | 0, 1 |
Columns | Data categories | Name, Age |
Data Types | Automatically inferred | String (Name), Integer (Age) |
Alignment | Automatic column alignment | Properly formatted table |
๐ก DataFrame Features: Notice how Pandas automatically formatted the data into a table structure with proper alignment and automatically assigned row indices (0, 1). This is one of Pandas' key strengths for data analysis.
๐ค Testing Scikit-learn: Dataset Loading
Finally, let's test Scikit-learn with a real dataset.
Creating the Scikit-learn Test Script
nano test_sklearn.py
Let's examine the Scikit-learn test script:
cat test_sklearn.py
Output:
from sklearn.datasets import load_iris
iris = load_iris()
print("Iris dataset, target names:", iris.target_names)
Understanding the Scikit-learn Test Code
Line | Code | Purpose |
---|---|---|
1 | from sklearn.datasets import load_iris | Imports the Iris dataset loader function |
2 | iris = load_iris() | Loads the complete Iris dataset into memory |
3 | print("Iris dataset, target names:", iris.target_names) | Displays the classification categories |
Running the Scikit-learn Test
python test_sklearn.py
Output:
Iris dataset, target names: ['setosa' 'versicolor' 'virginica']
Understanding the Scikit-learn Output
What this output reveals:
- Successful Import: Scikit-learn and its datasets module loaded correctly
- Dataset Access: The
load_iris()
function worked and returned a dataset object - Classification Labels: The Iris dataset contains three species of iris flowers
- Array Format: Output shows NumPy array format (no commas) containing three string labels
About the Iris Dataset:
- Classic Dataset: One of the most famous datasets in machine learning
- 150 Samples: 50 samples of each of the three species
- 4 Features: Sepal length, sepal width, petal length, petal width
- 3 Classes: setosa, versicolor, virginica (iris species)
โ Complete Success: All three packages are working perfectly! You now have a fully functional Python data science environment ready for serious work.
๐ File Management and Organization
Let's look at how our test files are organized in the system:
Listing Our Test Files
After creating all test scripts, our directory contains:
ls -la *.py
Expected output (similar to):
-rw-rw-r--. 1 centos9 centos9 87 test_numpy.py
-rw-rw-r--. 1 centos9 centos9 134 test_pandas.py
-rw-rw-r--. 1 centos9 centos9 89 test_sklearn.py
File Details:
File | Size (bytes) | Purpose | Complexity |
---|---|---|---|
test_numpy.py | 87 | Basic array creation and display | Simple |
test_pandas.py | 134 | DataFrame creation from dictionary | Moderate |
test_sklearn.py | 89 | Dataset loading and inspection | Simple |
๐งช Advanced Testing Ideas
Now that basic functionality is confirmed, here are some additional tests you can try:
Enhanced NumPy Test
import numpy as np
# Test array operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(f"Array 1: {arr1}")
print(f"Array 2: {arr2}")
print(f"Addition: {arr1 + arr2}")
print(f"Element-wise multiplication: {arr1 * arr2}")
print(f"Array shape: {arr1.shape}")
print(f"Array data type: {arr1.dtype}")
Enhanced Pandas Test
import pandas as pd
# Test DataFrame operations
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [24, 30, 35, 28],
'City': ['New York', 'London', 'Tokyo', 'Paris']
}
df = pd.DataFrame(data)
print("Complete DataFrame:")
print(df)
print(f"\nDataFrame shape: {df.shape}")
print(f"\nColumn names: {list(df.columns)}")
print(f"\nAge statistics:\n{df['Age'].describe()}")
Enhanced Scikit-learn Test
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load dataset and explore structure
iris = load_iris()
print(f"Dataset shape: {iris.data.shape}")
print(f"Number of features: {len(iris.feature_names)}")
print(f"Feature names: {iris.feature_names}")
print(f"Target names: {iris.target_names}")
# Test train_test_split functionality
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
print(f"\nTraining set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")
๐ Troubleshooting Common Issues
Import Errors
If you encounter import errors, check:
- Package Installation: Verify packages are installed in the correct environment
- Python Path: Ensure Python can find the installed packages
- Virtual Environment: Check if you're in the correct virtual environment
Version Compatibility
# Check versions
import numpy as np
import pandas as pd
import sklearn
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Scikit-learn version: {sklearn.__version__}")
Performance Testing
import time
import numpy as np
# Test NumPy performance
start_time = time.time()
large_array = np.random.random((1000, 1000))
result = np.sum(large_array)
end_time = time.time()
print(f"Large array sum: {result:.4f}")
print(f"Computation time: {end_time - start_time:.4f} seconds")
๐ฏ Key Takeaways
โ Remember These Points
- Test Each Package Separately: Isolate functionality to identify specific issues
- Understand Output Formats: Each package has distinct output representations
- Use Standard Aliases:
np
,pd
, andsklearn
are universally recognized - Start Simple: Basic tests confirm installation before complex operations
- Incremental Complexity: Build from simple imports to advanced functionality
๐ Learning Resources
- NumPy Documentation: numpy.org
- Pandas Tutorials: pandas.pydata.org
- Scikit-learn Guide: scikit-learn.org
- Jupyter Notebooks: Consider installing Jupyter for interactive development
๐ Testing Complete! You've successfully verified that NumPy, Pandas, and Scikit-learn are working correctly on your system. Your Python data science environment is now ready for real-world projects and advanced learning.
Ready for your first data science project? You now have all the tools needed to start analyzing data, building models, and creating insights!
๐ฌ Discussion
How did your testing experience go?
- Which package output surprised you the most?
- Have you tried the enhanced testing examples?
- What type of data science project interests you most?
- Did you encounter any issues during testing?
Connect with me:
- ๐ GitHub - Data science projects and examples
- ๐ฆ Twitter - Daily data science tips
- ๐ง Contact - Questions about data science setup
This testing guide provides practical verification methods for your Python data science installation. Regular testing ensures your environment remains functional as you add new packages and update existing ones.