NumPy. This library has become fundamental, it is hard to imagine a world of research and data science without it, or before its birth. NumPy has been around since 2005, and if you ever worked with data in Python, you must have used it, one way or the other.

What is NumPy?

Features

  • Multidimensional homogeneous arrays. Ndarray which are a ndimensional array
  • Various functions for arrays.
  • Reshaping of arrays  Python can be used as an alternative to MATLAB.

One trade-off of using Python is its computing speed. On the other hand, C is known for its high speed. Hence, the developers came to the conclusion of writing a package of numerical functions which is written in C, but which you can run from Python. So, without having to learn C, you can use its power in Python.

The biggest advantage of NumPy is its ability to handle numerical arrays. For example, if you have a list of values and you want to square each of them, the code in base Python will look like:

a = [1, 2, 3, 4, 5]
b = []
for i in a:
b.append(a**2)

and you will get [1, 4, 9, 16, 25] for b. Now, if you want to do the same with a 2-dimensional array, the base Python to do this is:

a = [[1, 2], [3, 4]]
b = [[],[]]
for i in range(len(a)):
for j in range(len(a[i])):
b[i].append(a[i][j]**2)

This would give you b equal to [[1, 4], [9, 16]]. To do the same with a 3D array you would need 3 nested loops and to do it in 4D would require 4 nested loops. However, with NumPy you can take the square of an array of any dimensions using the same line of code and no loops:

import numpy as npb = np.array(a)**2

Using numpy is much faster than the base python version! It is faster to run, saving you on computing time, and faster to write, saving you time writing your code. All of this allows you to write and run code much faster, and therefore do more science in less time. Not only that, if your friend has a look at your code, they will read the code and understand you want a squared value of the array in an instant, without having to decipher what the for loop is trying to do.

NumPy serves as the basis of most scientific packages in Python, including pandas, matplotlib, scipy, etc. Hence, it would be a good idea to explore the basics of data handling in Python with NumPy.

Installation requirements

The code is based on the Python 3.4/2.7- compatible version and NumPy version 1.9. The easiest way to install these requirements (and more) is to install a complete Python distribution, such as Enthought Canopy, EPD, Anaconda, or Python (x,y). Once you have installed any one of these, you can safely skip the remainder of this section and should be ready to begin.

Using Python package managers

$ pip install numpy
$ easy_install numpy
$ enpkg numpy # for Canopy users
$ conda install numpy # for Anaconda users

Using native package managers

Package managers and Commands:

Aptitude

$ sudo apt-get install python-numpy

Yum

$ yum install python-numpy

Homebrew

$ brew install numpy

Note that, when installing NumPy (or any other Python modules) on OS X systems with Homebrew, Python should have been originally installed with Homebrew.

Detailed installation instructions are available on the respective websites of NumPy, IPython, and matplotlib. As a precaution, to check whether NumPy was installed properly, open an IPython terminal and type the following commands:

In [1]: import numpy as np 

If the first statement looks like it does nothing, this is a good sign. If it executes without any output, this means that NumPy was installed and has been imported properly into your Python session.

Congratulations! We are now ready to begin.

Why should we Use?

  1. Less Memory usage
  2. Fast performance
  3. Convenient to Work

The very first reason to prefer python numpy arrays is that it takes less memory as compared to the python list. Then, it is fast in terms of execution and at the same time, it is convenient and easy to work with it.

What can we do with Numpy?

arrayA = ['Hello', 'world'] print(arrayA)

But it’s still a python list, not an array.

So here comes Numpy which we can use to create 2D,3D that is multidimensional arrays. Also, we can do computations on arrays.

import numpy as num
arr = num.array([1,2,3,4,5,6])
print(arr)

Creates array arr.

Then, for 2D and 3D arrays,

import numpy as num
arr = num.array([(1,2,3,4,5),(6,7,8,9,10,11)])
print(arr)

–If you want to know the dimensions of your array, you can simply use the following function.

print(arr.ndim)

–If you want to find out the size of an array, you can simply use the following function,

print(arr.size)

–To find out the shape of an array, you can use shape function.

print(arr.shape)

It will tell you the number of (col, rows)

You can also use slicing, reshaping and many more methods with numpy arrays.

Why do we Need?

NumPy Ndarray

In Numpy, the number of dimensions of the array is given by Rank. In the above example, the ranks of the array of 1D, 2D, and 3D arrays are 1, 2 and 3 respectively.

Syntax:

np.ndarray(shape, dtype= int, buffer=None, offset=0, strides=None, order=None)

Here, the size and the number of elements present in the array is given by the shape attribute. The data type of the array(elements in particular) is given by the dtype attribute. Buffer attribute is an object exposing the buffer interface. An offset is the offset of the array data in the buffer. Stride attribute specifies the number of locations in the memory between the starting of successive array elements.

It should always be greater or equal to the size of the data type of the elements. Finally, the order attribute is to specify if we want a row-major or column-major order. Among all the above-mentioned attributes, shape and dtype are the compulsory ones. All other attributes are optional and can be specified on the requirement basis.

Working with Ndarray

  • np.ndarray(shape, type): Creates an array of the given shape with random numbers.
  • np.array(array_object): Creates an array of the given shape from the list or tuple.
  • np.zeros(shape): Creates an array of the given shape with all zeros.
  • np.ones(shape): Creates an array of the given shape with all ones.
  • np.full(shape,array_object, dtype): Creates an array of the given shape with complex numbers.
  • np.arange(range): Creates an array with the specified range.

Examples of Ndarray

Example #1: Attributes of a multidimensional array(ndarray)

Output:

Indexing & Slicing

As mentioned earlier, items in ndarray object follows zero-based index. Three types of indexing methods are available − field access, basic slicing and advanced indexing.

Basic slicing is an extension of Python’s basic concept of slicing to n dimensions. A Python slice object is constructed by giving start, stop, and step parameters to the built-in slice function. This slice object is passed to the array to extract a part of array.

Example #1

import numpy as np 
a = np.arange(10)
s = slice(2,7,2)
print a[s]

Its output is as follows −

[2  4  6]

In the above example, an ndarray object is prepared by arange() function. Then a slice object is defined with start, stop, and step values 2, 7, and 2 respectively. When this slice object is passed to the ndarray, a part of it starting with index 2 up to 7 with a step of 2 is sliced.

The same result can also be obtained by giving the slicing parameters separated by a colon : (start:stop:step) directly to the ndarray object.

Example #2

import numpy as np 
a = np.arange(10)
b = a[2:7:2]
print b

Here, we will get the same output −

[2  4  6]

If only one parameter is put, a single item corresponding to the index will be returned. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index) with default step one are sliced.

Example #3

# slice single item 
import numpy as np
a = np.arange(10)
b = a[5]
print b

Its output is as follows −

5

Example #4

# slice items starting from index 
import numpy as np
a = np.arange(10)
print a[2:]

Now, the output would be −

[2  3  4  5  6  7  8  9]

Example #5

# slice items between indexes 
import numpy as np
a = np.arange(10)
print a[2:5]

Here, the output would be −

[2  3  4]

The above description applies to multi-dimensional ndarray too.

Example #6

import numpy as np 
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print a
# slice items starting from index
print 'Now we will slice the array from the index a[1:]'
print a[1:]

The output is as follows −

[[1 2 3]
[3 4 5]
[4 5 6]]
Now we will slice the array from the index a[1:]
[[3 4 5]
[4 5 6]]

Slicing can also include ellipsis (…) to make a selection tuple of the same length as the dimension of an array. If ellipsis is used at the row position, it will return an ndarray comprising of items in rows.

Example #7

# array to begin with 
import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print 'Our array is:'
print a
print '\n'
# this returns array of items in the second column
print 'The items in the second column are:'
print a[...,1]
print '\n'
# Now we will slice all items from the second row
print 'The items in the second row are:'
print a[1,...]
print '\n'
# Now we will slice all items from column 1 onwards
print 'The items column 1 onwards are:'
print a[...,1:]

The output of this program is as follows −

Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]

The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]

Copies & Views

No Copy

Furthermore, any changes in either gets reflected in the other. For example, the changing shape of one will change the shape of the other too.

Example

import numpy as np 
a = np.arange(6)
print 'Our array is:'
print a
print 'Applying id() function:'
print id(a)
print 'a is assigned to b:'
b = a
print b
print 'b has same id():'
print id(b)
print 'Change shape of b:'
b.shape = 3,2
print b
print 'Shape of a also gets changed:'
print a

It will produce the following output −

Our array is:
[0 1 2 3 4 5]
Applying id() function:
139747815479536
a is assigned to b:
[0 1 2 3 4 5]
b has same id():
139747815479536
Change shape of b:
[[0 1]
[2 3]
[4 5]]
Shape of a also gets changed:
[[0 1]
[2 3]
[4 5]]

View or Shallow Copy

Example

import numpy as np 
# To begin with, a is 3X2 array
a = np.arange(6).reshape(3,2)
print 'Array a:'
print a
print 'Create view of a:'
b = a.view()
print b
print 'id() for both the arrays are different:'
print 'id() of a:'
print id(a)
print 'id() of b:'
print id(b)
# Change the shape of b. It does not change the shape of a
b.shape = 2,3
print 'Shape of b:'
print b
print 'Shape of a:'
print a

It will produce the following output −

Array a:
[[0 1]
[2 3]
[4 5]]
Create view of a:
[[0 1]
[2 3]
[4 5]]
id() for both the arrays are different:
id() of a:
140424307227264
id() of b:
140424151696288
Shape of b:
[[0 1 2]
[3 4 5]]
Shape of a:
[[0 1]
[2 3]
[4 5]]

Slice of an array creates a view.

Example

import numpy as np 
a = np.array([[10,10], [2,3], [4,5]])
print 'Our array is:'
print a
print 'Create a slice:'
s = a[:, :2]
print s

It will produce the following output −

Our array is:
[[10 10]
[ 2 3]
[ 4 5]]
Create a slice:
[[10 10]
[ 2 3]
[ 4 5]]

Deep Copy

Example

import numpy as np 
a = np.array([[10,10], [2,3], [4,5]])
print 'Array a is:'
print a
print 'Create a deep copy of a:'
b = a.copy()
print 'Array b is:'
print b
#b does not share any memory of a
print 'Can we write b is a'
print b is a
print 'Change the contents of b:'
b[0,0] = 100
print 'Modified array b:'
print b
print 'a remains unchanged:'
print a

It will produce the following output −

Array a is:
[[10 10]
[ 2 3]
[ 4 5]]
Create a deep copy of a:
Array b is:
[[10 10]
[ 2 3]
[ 4 5]]
Can we write b is a
False
Change the contents of b:
Modified array b:
[[100 10]
[ 2 3]
[ 4 5]]
a remains unchanged:
[[10 10]
[ 2 3]
[ 4 5]]

Universal Functions: Fast Element-wise Array Functions

Many ufuncs are simple elementwise transformations, like sqrt or exp:

In [120]: arr = np.arange(10)In [121]: np.sqrt(arr)
Out[121]:
array([ 0. , 1. , 1.4142, 1.7321, 2. , 2.2361, 2.4495,
2.6458, 2.8284, 3. ])
In [122]: np.exp(arr)
Out[122]:
array([ 1. , 2.7183, 7.3891, 20.0855, 54.5982,
148.4132, 403.4288, 1096.6332, 2980.958 , 8103.0839])

These are referred to as unary ufuncs. Others, such as add or maximum, take 2 arrays (thus, binary ufuncs) and return a single array as the result:

In [123]: x = np.random.randn(8)In [124]: y = np.random.randn(8)In [125]: x
Out[125]:
array([ 0.0749, 0.0974, 0.2002, -0.2551, 0.4655, 0.9222, 0.446 ,
-0.9337])
In [126]: y
Out[126]:
array([ 0.267 , -1.1131, -0.3361, 0.6117, -1.2323, 0.4788, 0.4315,
-0.7147])
In [127]: np.maximum(x, y) # element-wise maximum
Out[127]:
array([ 0.267 , 0.0974, 0.2002, 0.6117, 0.4655, 0.9222, 0.446 ,
-0.7147])

While not common, a ufunc can return multiple arrays. modf is one example, a vectorized version of the built-in Python divmod: it returns the fractional and integral parts of a floating point array:

In [128]: arr = randn(7) * 5In [129]: np.modf(arr)
Out[129]:
(array([-0.6808, 0.0636, -0.386 , 0.1393, -0.8806, 0.9363, -0.883 ]),
array([-2., 4., -3., 5., -3., 3., -6.]))

Advantages of NumPy

  • The core of Numpy is its arrays. One of the main advantages of using Numpy arrays is that they take less memory space and provide better runtime speed when compared with similar data structures in python(lists and tuples).
  • Numpy support some specific scientific functions such as linear algebra. They help us in solving linear equations.
  • Numpy support vectorized operations, like elementwise addition and multiplication, computing Kronecker product, etc. Python lists fail to support these features.
  • It is a very good substitute for MATLAB, OCTAVE, etc as it provides similar functionalities and supports with faster development and less mental overhead(as python is easy to write and comprehend)
  • NumPy is very good for data analysis.

Disadvantages of NumPy

  • Using “nan” in Numpy: “Nan” stands for “not a number”. It was designed to address the problem of missing values. NumPy itself supports “nan” but lack of cross-platform support within Python makes it difficult for the user. That’s why we may face problems when comparing values within the Python interpreter.
  • Require a contiguous allocation of memory: Insertion and deletion operations become costly as data is stored in contiguous memory locations as shifting it requires shifting.

Linear Algebra with NumPy

For example, to construct a numpy array that corresponds to the matrix

we would do

A = np.array([[1,-1,2],[3,2,0]])

Vectors are just arrays with a single column. For example, to construct a vector

we would do

v = np.array([[2],[1],[3]])

A more convenient approach is to transpose the corresponding row vector. For example, to make the vector above we could instead transpose the row vector

The code for this is

v = np.transpose(np.array([[2,1,3]]))

numpy overloads the array index and slicing notations to access parts of a matrix. For example, to print the bottom right entry in the matrix A we would do

print(A[1,2])

To slice out the second column in the A matrix we would do

col = A[:,1:2]

The first slice selects all rows in A, while the second slice selects just the middle entry in each row.

To do a matrix multiplication or a matrix-vector multiplication we use the np.dot() method.

w = np.dot(A,v)

Solving systems of equations with numpy

A x = b

where

We start by constructing the arrays for A and b.

A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
b = np.transpose(np.array([[-3,5,-2]])

To solve the system we do

x = np.linalg.solve(A,b)

Application: multiple linear regression

In a simple least-squares linear regression model we seek a vector β such that the product Xβ most closely approximates the outcome vector y.

Once we have constructed the β vector we can use it to map input data to a predicted outcomes. Given an input vector in the form

we can compute a predicted outcome value

The formula to compute the β vector is

β = (XT X)-1 XT y

In our next example program I will use numpy to construct the appropriate matrices and vectors and solve for the β vector. Once we have solved for β we will use it to make predictions for some test data points that we initially left out of our input data set.

Assuming we have constructed the input matrix X and the outcomes vector y in numpy, the following code will compute the β vector:

Xt = np.transpose(X)
XtX = np.dot(Xt,X)
Xty = np.dot(Xt,y)
beta = np.linalg.solve(XtX,Xty)

The last line uses np.linalg.solve to compute β, since the equation

β = (XT X)-1 XT y

is mathematically equivalent to the system of equations

(XT X) β = XT y

The data set I will use for this example is the Windsor house price data set, which contains information about home sales in the Windsor, Ontario area. The input variables cover a range of factors that may potentially have an impact on house prices, such as lot size, number of bedrooms, and the presence of various amenities. A CSV file with the full data set is available here. I downloaded the data set from this site, which offers a large number of data sets covering a large range of topics.

Here now is the source code for the example program.

import csv
import numpy as np
def readData():
X = []
y = []
with open('Housing.csv') as f:
rdr = csv.reader(f)
# Skip the header row
next(rdr)
# Read X and y
for line in rdr:
xline = [1.0]
for s in line[:-1]:
xline.append(float(s))
X.append(xline)
y.append(float(line[-1]))
return (X,y)
X0,y0 = readData()
# Convert all but the last 10 rows of the raw data to numpy arrays
d = len(X0)-10
X = np.array(X0[:d])
y = np.transpose(np.array([y0[:d]]))
# Compute beta
Xt = np.transpose(X)
XtX = np.dot(Xt,X)
Xty = np.dot(Xt,y)
beta = np.linalg.solve(XtX,Xty)
print(beta)
# Make predictions for the last 10 rows in the data set
for data,actual in zip(X0[d:],y0[d:]):
x = np.array([data])
prediction = np.dot(x,beta)
print('prediction = '+str(prediction[0,0])+' actual = '+str(actual))

The original data set consists of over 500 entries. To test the accuracy of the predictions made by the linear regression model we use all but the last 10 data entries to build the regression model and compute β. Once we have constructed the β vector we use it to make predictions for the last 10 input values and then compare the predicted home prices against the actual home prices from the data set.

Here are the outputs produced by the program:

[[ -4.14106096e+03]
[ 3.55197583e+00]
[ 1.66328263e+03]
[ 1.45465644e+04]
[ 6.77755381e+03]
[ 6.58750520e+03]
[ 4.44683380e+03]
[ 5.60834856e+03]
[ 1.27979572e+04]
[ 1.24091640e+04]
[ 4.19931185e+03]
[ 9.42215457e+03]]
prediction = 97360.6550969 actual = 82500.0
prediction = 71774.1659014 actual = 83000.0
prediction = 92359.0891976 actual = 84000.0
prediction = 77748.2742379 actual = 85000.0
prediction = 91015.5903066 actual = 85000.0
prediction = 97545.1179047 actual = 91500.0
prediction = 97360.6550969 actual = 94000.0
prediction = 106006.800756 actual = 103000.0
prediction = 92451.6931269 actual = 105000.0
prediction = 73458.2949381 actual = 105000.0

Refrences :

https://www.educba.com/numpy-ndarray/

https://towardsdatascience.com/a-hitchhiker-guide-to-python-numpy-arrays-9358de570121

https://www.tutorialspoint.com/numpy/numpy_indexing_and_slicing.htm

That’s all for this particular post. Will come up with another set of interesting Data Science topics in another post

Thanks for Reading, keep learning !!!

Passionate about ML