Member-only story
Machine Learning
Reading Different Data Inputs in Machine Learning with Python
Useful methods to read inputs with python
Introduction
The very first step in machine learning is to load the data. The data can be structured or unstructured data and it can be a log file, dataset file, or dataset. General methods to load data from a variety of sources, include CSV files and SQL datasets.
Loading a Sample Dataset
we can load a preexisting sample dataset with the help of scikit-learn, it has several popular datasets.
# Load scikit-learn's datasets
from sklearn import datasets
# Load digits dataset
digits = datasets.load_digits()
# Create features matrix
features = digits.data
# Create target vector
target = digits.target
# View first observation
features[0]
Real-world dataset, we generally perform loading, transforming, and cleaning. Fortune that scikit learns has common datasets where we can directly load them.
few example datasets are:
- load_boston
- load_iris
- load_digits
Creating a Simulated Dataset
we can generate a dataset from simulated data. Scikit-learn offers many methods for creating simulated data. Three methods are often used.
To use with “Linear Regression” — “make_regression”
# Load library
from sklearn.datasets import make_regression
#methods to generate vector and matrix
features, target, coefficients = make_regression(n_samples = 100,
n_features = 3,
n_informative = 3,
n_targets = 1,
noise = 0.0,
coef = True…