NumPy Basics

Wasim Alam
18 min readJan 8, 2021

NumPy. This library has become fundamental, it is hard to imagine a world of research and data science without it, or before its birth. NumPy has been around since 2005, and if you ever worked with data in Python, you must have used it, one way or the other.

What is NumPy?

So what is NumPy? According to the official website, NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

Features

  • It is a combination of C and python
  • Multidimensional homogeneous arrays. Ndarray which are a ndimensional array
  • Various functions for arrays.
  • Reshaping of arrays  Python can be used as an alternative to MATLAB.

One trade-off of using Python is its computing speed. On the other hand, C is known for its high speed. Hence, the developers came to the conclusion of writing a package of numerical functions which is written in C, but which you can run from Python. So, without having to learn C, you can use its power in Python.

The biggest advantage of NumPy is its ability to handle numerical arrays. For example, if you have a list of values and you want to square each of them, the code in base Python will look like:

a = [1, 2, 3, 4, 5]
b = []
for i in a:
b.append(a**2)

and you will get [1, 4, 9, 16, 25] for b. Now, if you want to do the same with a 2-dimensional array, the base Python to do this is:

a = [[1, 2], [3, 4]]
b = [[],[]]
for i in range(len(a)):
for j in range(len(a[i])):
b[i].append(a[i][j]**2)

This would give you b equal to [[1, 4], [9, 16]]. To do the same with a 3D array you would need 3 nested loops and to do it in 4D would require 4 nested loops. However, with NumPy you can take the square of an array of any dimensions using the same line of code and no loops:

import numpy as npb = np.array(a)**2

Using numpy is much faster than the base python version! It is faster to run, saving you on computing time, and faster to write, saving you time writing your code. All of this allows you to write and run code much faster, and therefore do more science in less time. Not only that, if your friend has a look at your code, they will read the code and understand you want a squared value of the array in an instant, without having to decipher what the for loop is trying to do.

NumPy serves as the basis of most scientific packages in Python, including pandas, matplotlib, scipy, etc. Hence, it would be a good idea to explore the basics of data handling in Python with NumPy.

Installation requirements

Let’s take a look at the various requirements we need to set up before we proceed.

The code is based on the Python 3.4/2.7- compatible version and NumPy version 1.9. The easiest way to install these requirements (and more) is to install a complete Python distribution, such as Enthought Canopy, EPD, Anaconda, or Python (x,y). Once you have installed any one of these, you can safely skip the remainder of this section and should be ready to begin.

Using Python package managers

You can also use Python package managers, such enpkg, Conda, pip or easy_install, to install the requirements using one of the following commands; replace numpy with any other package name you'd like to install, for example, ipython, matplotlib and so on:

$ pip install numpy
$ easy_install numpy
$ enpkg numpy # for Canopy users
$ conda install numpy # for Anaconda users

Using native package managers

If the Python interpreter you want to use comes with the OS and is not a third-party installation, you may prefer using OS-specific package managers such as aptitude, yum, or Homebrew. The following table illustrates the package managers and the respective commands used to install NumPy:

Package managers and Commands:

Aptitude

$ sudo apt-get install python-numpy

Yum

$ yum install python-numpy

Homebrew

$ brew install numpy

Note that, when installing NumPy (or any other Python modules) on OS X systems with Homebrew, Python should have been originally installed with Homebrew.

Detailed installation instructions are available on the respective websites of NumPy, IPython, and matplotlib. As a precaution, to check whether NumPy was installed properly, open an IPython terminal and type the following commands:

In [1]: import numpy as np 

If the first statement looks like it does nothing, this is a good sign. If it executes without any output, this means that NumPy was installed and has been imported properly into your Python session.

Congratulations! We are now ready to begin.

Why should we Use?

We use python numpy array instead of a list because of the below three reasons:

  1. Less Memory usage
  2. Fast performance
  3. Convenient to Work

The very first reason to prefer python numpy arrays is that it takes less memory as compared to the python list. Then, it is fast in terms of execution and at the same time, it is convenient and easy to work with it.

What can we do with Numpy?

Built-in support for Arrays is not available in python, but we can use python lists as arrays.

arrayA = ['Hello', 'world'] print(arrayA)

But it’s still a python list, not an array.

So here comes Numpy which we can use to create 2D,3D that is multidimensional arrays. Also, we can do computations on arrays.

import numpy as num
arr = num.array([1,2,3,4,5,6])
print(arr)

Creates array arr.

Then, for 2D and 3D arrays,

import numpy as num
arr = num.array([(1,2,3,4,5),(6,7,8,9,10,11)])
print(arr)

–If you want to know the dimensions of your array, you can simply use the following function.

print(arr.ndim)

–If you want to find out the size of an array, you can simply use the following function,

print(arr.size)

–To find out the shape of an array, you can use shape function.

print(arr.shape)

It will tell you the number of (col, rows)

You can also use slicing, reshaping and many more methods with numpy arrays.

Why do we Need?

To make a logical and mathematical computation on array and matrices numpy is needed. It performs these operations way too efficient and faster than python lists.

NumPy Ndarray

Ndarray is one of the most important classes in the NumPy python library. It is basically a multidimensional or n-dimensional array of fixed size with homogeneous elements( i.e. data type of all the elements in the array is the same). A multidimensional array looks something like this:

In Numpy, the number of dimensions of the array is given by Rank. In the above example, the ranks of the array of 1D, 2D, and 3D arrays are 1, 2 and 3 respectively.

Syntax:

np.ndarray(shape, dtype= int, buffer=None, offset=0, strides=None, order=None)

Here, the size and the number of elements present in the array is given by the shape attribute. The data type of the array(elements in particular) is given by the dtype attribute. Buffer attribute is an object exposing the buffer interface. An offset is the offset of the array data in the buffer. Stride attribute specifies the number of locations in the memory between the starting of successive array elements.

It should always be greater or equal to the size of the data type of the elements. Finally, the order attribute is to specify if we want a row-major or column-major order. Among all the above-mentioned attributes, shape and dtype are the compulsory ones. All other attributes are optional and can be specified on the requirement basis.

Working with Ndarray

An array can be created using the following functions :

  • np.ndarray(shape, type): Creates an array of the given shape with random numbers.
  • np.array(array_object): Creates an array of the given shape from the list or tuple.
  • np.zeros(shape): Creates an array of the given shape with all zeros.
  • np.ones(shape): Creates an array of the given shape with all ones.
  • np.full(shape,array_object, dtype): Creates an array of the given shape with complex numbers.
  • np.arange(range): Creates an array with the specified range.

Examples of Ndarray

Given below are the examples of Ndarray:

Example #1: Attributes of a multidimensional array(ndarray)

import numpy as np
#creating an array to understand its attributes
A = np.array([[1,2,3],[1,2,3],[1,2,3]])
print("Array A is:\n",A)
#type of array
print("Type:", type(A))
#Shape of array
print("Shape:", A.shape)
#no. of dimensions
print("Rank:", A.ndim)
#size of array
print("Size:", A.size)
#type of each element in the array
print("Element type:", A.dtype)

Output:

Indexing & Slicing

Contents of ndarray object can be accessed and modified by indexing or slicing, just like Python’s in-built container objects.

As mentioned earlier, items in ndarray object follows zero-based index. Three types of indexing methods are available − field access, basic slicing and advanced indexing.

Basic slicing is an extension of Python’s basic concept of slicing to n dimensions. A Python slice object is constructed by giving start, stop, and step parameters to the built-in slice function. This slice object is passed to the array to extract a part of array.

Example #1

import numpy as np 
a = np.arange(10)
s = slice(2,7,2)
print a[s]

Its output is as follows −

[2  4  6]

In the above example, an ndarray object is prepared by arange() function. Then a slice object is defined with start, stop, and step values 2, 7, and 2 respectively. When this slice object is passed to the ndarray, a part of it starting with index 2 up to 7 with a step of 2 is sliced.

The same result can also be obtained by giving the slicing parameters separated by a colon : (start:stop:step) directly to the ndarray object.

Example #2

import numpy as np 
a = np.arange(10)
b = a[2:7:2]
print b

Here, we will get the same output −

[2  4  6]

If only one parameter is put, a single item corresponding to the index will be returned. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index) with default step one are sliced.

Example #3

# slice single item 
import numpy as np
a = np.arange(10)
b = a[5]
print b

Its output is as follows −

5

Example #4

# slice items starting from index 
import numpy as np
a = np.arange(10)
print a[2:]

Now, the output would be −

[2  3  4  5  6  7  8  9]

Example #5

# slice items between indexes 
import numpy as np
a = np.arange(10)
print a[2:5]

Here, the output would be −

[2  3  4]

The above description applies to multi-dimensional ndarray too.

Example #6

import numpy as np 
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print a
# slice items starting from index
print 'Now we will slice the array from the index a[1:]'
print a[1:]

The output is as follows −

[[1 2 3]
[3 4 5]
[4 5 6]]
Now we will slice the array from the index a[1:]
[[3 4 5]
[4 5 6]]

Slicing can also include ellipsis (…) to make a selection tuple of the same length as the dimension of an array. If ellipsis is used at the row position, it will return an ndarray comprising of items in rows.

Example #7

# array to begin with 
import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print 'Our array is:'
print a
print '\n'
# this returns array of items in the second column
print 'The items in the second column are:'
print a[...,1]
print '\n'
# Now we will slice all items from the second row
print 'The items in the second row are:'
print a[1,...]
print '\n'
# Now we will slice all items from column 1 onwards
print 'The items column 1 onwards are:'
print a[...,1:]

The output of this program is as follows −

Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]

The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]

Copies & Views

While executing the functions, some of them return a copy of the input array, while some return the view. When the contents are physically stored in another location, it is called Copy. If on the other hand, a different view of the same memory content is provided, we call it as View.

No Copy

Simple assignments do not make the copy of array object. Instead, it uses the same id() of the original array to access it. The id() returns a universal identifier of Python object, similar to the pointer in C.

Furthermore, any changes in either gets reflected in the other. For example, the changing shape of one will change the shape of the other too.

Example

import numpy as np 
a = np.arange(6)
print 'Our array is:'
print a
print 'Applying id() function:'
print id(a)
print 'a is assigned to b:'
b = a
print b
print 'b has same id():'
print id(b)
print 'Change shape of b:'
b.shape = 3,2
print b
print 'Shape of a also gets changed:'
print a

It will produce the following output −

Our array is:
[0 1 2 3 4 5]
Applying id() function:
139747815479536
a is assigned to b:
[0 1 2 3 4 5]
b has same id():
139747815479536
Change shape of b:
[[0 1]
[2 3]
[4 5]]
Shape of a also gets changed:
[[0 1]
[2 3]
[4 5]]

View or Shallow Copy

NumPy has ndarray.view() method which is a new array object that looks at the same data of the original array. Unlike the earlier case, change in dimensions of the new array doesn’t change dimensions of the original.

Example

import numpy as np 
# To begin with, a is 3X2 array
a = np.arange(6).reshape(3,2)
print 'Array a:'
print a
print 'Create view of a:'
b = a.view()
print b
print 'id() for both the arrays are different:'
print 'id() of a:'
print id(a)
print 'id() of b:'
print id(b)
# Change the shape of b. It does not change the shape of a
b.shape = 2,3
print 'Shape of b:'
print b
print 'Shape of a:'
print a

It will produce the following output −

Array a:
[[0 1]
[2 3]
[4 5]]
Create view of a:
[[0 1]
[2 3]
[4 5]]
id() for both the arrays are different:
id() of a:
140424307227264
id() of b:
140424151696288
Shape of b:
[[0 1 2]
[3 4 5]]
Shape of a:
[[0 1]
[2 3]
[4 5]]

Slice of an array creates a view.

Example

import numpy as np 
a = np.array([[10,10], [2,3], [4,5]])
print 'Our array is:'
print a
print 'Create a slice:'
s = a[:, :2]
print s

It will produce the following output −

Our array is:
[[10 10]
[ 2 3]
[ 4 5]]
Create a slice:
[[10 10]
[ 2 3]
[ 4 5]]

Deep Copy

The ndarray.copy() function creates a deep copy. It is a complete copy of the array and its data, and doesn’t share with the original array.

Example

import numpy as np 
a = np.array([[10,10], [2,3], [4,5]])
print 'Array a is:'
print a
print 'Create a deep copy of a:'
b = a.copy()
print 'Array b is:'
print b
#b does not share any memory of a
print 'Can we write b is a'
print b is a
print 'Change the contents of b:'
b[0,0] = 100
print 'Modified array b:'
print b
print 'a remains unchanged:'
print a

It will produce the following output −

Array a is:
[[10 10]
[ 2 3]
[ 4 5]]
Create a deep copy of a:
Array b is:
[[10 10]
[ 2 3]
[ 4 5]]
Can we write b is a
False
Change the contents of b:
Modified array b:
[[100 10]
[ 2 3]
[ 4 5]]
a remains unchanged:
[[10 10]
[ 2 3]
[ 4 5]]

Universal Functions: Fast Element-wise Array Functions

A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

Many ufuncs are simple elementwise transformations, like sqrt or exp:

In [120]: arr = np.arange(10)In [121]: np.sqrt(arr)
Out[121]:
array([ 0. , 1. , 1.4142, 1.7321, 2. , 2.2361, 2.4495,
2.6458, 2.8284, 3. ])
In [122]: np.exp(arr)
Out[122]:
array([ 1. , 2.7183, 7.3891, 20.0855, 54.5982,
148.4132, 403.4288, 1096.6332, 2980.958 , 8103.0839])

These are referred to as unary ufuncs. Others, such as add or maximum, take 2 arrays (thus, binary ufuncs) and return a single array as the result:

In [123]: x = np.random.randn(8)In [124]: y = np.random.randn(8)In [125]: x
Out[125]:
array([ 0.0749, 0.0974, 0.2002, -0.2551, 0.4655, 0.9222, 0.446 ,
-0.9337])
In [126]: y
Out[126]:
array([ 0.267 , -1.1131, -0.3361, 0.6117, -1.2323, 0.4788, 0.4315,
-0.7147])
In [127]: np.maximum(x, y) # element-wise maximum
Out[127]:
array([ 0.267 , 0.0974, 0.2002, 0.6117, 0.4655, 0.9222, 0.446 ,
-0.7147])

While not common, a ufunc can return multiple arrays. modf is one example, a vectorized version of the built-in Python divmod: it returns the fractional and integral parts of a floating point array:

In [128]: arr = randn(7) * 5In [129]: np.modf(arr)
Out[129]:
(array([-0.6808, 0.0636, -0.386 , 0.1393, -0.8806, 0.9363, -0.883 ]),
array([-2., 4., -3., 5., -3., 3., -6.]))

Advantages of NumPy

Below are the points explain the advantages of NumPy:

  • The core of Numpy is its arrays. One of the main advantages of using Numpy arrays is that they take less memory space and provide better runtime speed when compared with similar data structures in python(lists and tuples).
  • Numpy support some specific scientific functions such as linear algebra. They help us in solving linear equations.
  • Numpy support vectorized operations, like elementwise addition and multiplication, computing Kronecker product, etc. Python lists fail to support these features.
  • It is a very good substitute for MATLAB, OCTAVE, etc as it provides similar functionalities and supports with faster development and less mental overhead(as python is easy to write and comprehend)
  • NumPy is very good for data analysis.

Disadvantages of NumPy

Below are the points explain the disadvantages of NumPy:

  • Using “nan” in Numpy: “Nan” stands for “not a number”. It was designed to address the problem of missing values. NumPy itself supports “nan” but lack of cross-platform support within Python makes it difficult for the user. That’s why we may face problems when comparing values within the Python interpreter.
  • Require a contiguous allocation of memory: Insertion and deletion operations become costly as data is stored in contiguous memory locations as shifting it requires shifting.

Linear Algebra with NumPy

The numpy ndarray class is used to represent both matrices and vectors. To construct a matrix in numpy we list the rows of the matrix in a list and pass that list to the numpy array constructor.

For example, to construct a numpy array that corresponds to the matrix

we would do

A = np.array([[1,-1,2],[3,2,0]])

Vectors are just arrays with a single column. For example, to construct a vector

we would do

v = np.array([[2],[1],[3]])

A more convenient approach is to transpose the corresponding row vector. For example, to make the vector above we could instead transpose the row vector

The code for this is

v = np.transpose(np.array([[2,1,3]]))

numpy overloads the array index and slicing notations to access parts of a matrix. For example, to print the bottom right entry in the matrix A we would do

print(A[1,2])

To slice out the second column in the A matrix we would do

col = A[:,1:2]

The first slice selects all rows in A, while the second slice selects just the middle entry in each row.

To do a matrix multiplication or a matrix-vector multiplication we use the np.dot() method.

w = np.dot(A,v)

Solving systems of equations with numpy

One of the more common problems in linear algebra is solving a matrix-vector equation. Here is an example. We seek the vector x that solves the equation

A x = b

where

We start by constructing the arrays for A and b.

A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
b = np.transpose(np.array([[-3,5,-2]])

To solve the system we do

x = np.linalg.solve(A,b)

Application: multiple linear regression

In a multiple regression problem we seek a function that can map input data points to outcome values. Each data point is a feature vector (x1 , x2 , …, xm) composed of two or more data values that capture various features of the input. To represent all of the input data along with the vector of output values we set up a input matrix X and an output vector y:

In a simple least-squares linear regression model we seek a vector β such that the product Xβ most closely approximates the outcome vector y.

Once we have constructed the β vector we can use it to map input data to a predicted outcomes. Given an input vector in the form

we can compute a predicted outcome value

The formula to compute the β vector is

β = (XT X)-1 XT y

In our next example program I will use numpy to construct the appropriate matrices and vectors and solve for the β vector. Once we have solved for β we will use it to make predictions for some test data points that we initially left out of our input data set.

Assuming we have constructed the input matrix X and the outcomes vector y in numpy, the following code will compute the β vector:

Xt = np.transpose(X)
XtX = np.dot(Xt,X)
Xty = np.dot(Xt,y)
beta = np.linalg.solve(XtX,Xty)

The last line uses np.linalg.solve to compute β, since the equation

β = (XT X)-1 XT y

is mathematically equivalent to the system of equations

(XT X) β = XT y

The data set I will use for this example is the Windsor house price data set, which contains information about home sales in the Windsor, Ontario area. The input variables cover a range of factors that may potentially have an impact on house prices, such as lot size, number of bedrooms, and the presence of various amenities. A CSV file with the full data set is available here. I downloaded the data set from this site, which offers a large number of data sets covering a large range of topics.

Here now is the source code for the example program.

import csv
import numpy as np
def readData():
X = []
y = []
with open('Housing.csv') as f:
rdr = csv.reader(f)
# Skip the header row
next(rdr)
# Read X and y
for line in rdr:
xline = [1.0]
for s in line[:-1]:
xline.append(float(s))
X.append(xline)
y.append(float(line[-1]))
return (X,y)
X0,y0 = readData()
# Convert all but the last 10 rows of the raw data to numpy arrays
d = len(X0)-10
X = np.array(X0[:d])
y = np.transpose(np.array([y0[:d]]))
# Compute beta
Xt = np.transpose(X)
XtX = np.dot(Xt,X)
Xty = np.dot(Xt,y)
beta = np.linalg.solve(XtX,Xty)
print(beta)
# Make predictions for the last 10 rows in the data set
for data,actual in zip(X0[d:],y0[d:]):
x = np.array([data])
prediction = np.dot(x,beta)
print('prediction = '+str(prediction[0,0])+' actual = '+str(actual))

The original data set consists of over 500 entries. To test the accuracy of the predictions made by the linear regression model we use all but the last 10 data entries to build the regression model and compute β. Once we have constructed the β vector we use it to make predictions for the last 10 input values and then compare the predicted home prices against the actual home prices from the data set.

Here are the outputs produced by the program:

[[ -4.14106096e+03]
[ 3.55197583e+00]
[ 1.66328263e+03]
[ 1.45465644e+04]
[ 6.77755381e+03]
[ 6.58750520e+03]
[ 4.44683380e+03]
[ 5.60834856e+03]
[ 1.27979572e+04]
[ 1.24091640e+04]
[ 4.19931185e+03]
[ 9.42215457e+03]]
prediction = 97360.6550969 actual = 82500.0
prediction = 71774.1659014 actual = 83000.0
prediction = 92359.0891976 actual = 84000.0
prediction = 77748.2742379 actual = 85000.0
prediction = 91015.5903066 actual = 85000.0
prediction = 97545.1179047 actual = 91500.0
prediction = 97360.6550969 actual = 94000.0
prediction = 106006.800756 actual = 103000.0
prediction = 92451.6931269 actual = 105000.0
prediction = 73458.2949381 actual = 105000.0

Refrences :

https://www.educba.com/numpy-ndarray/

https://towardsdatascience.com/a-hitchhiker-guide-to-python-numpy-arrays-9358de570121

https://www.tutorialspoint.com/numpy/numpy_indexing_and_slicing.htm

That’s all for this particular post. Will come up with another set of interesting Data Science topics in another post

Thanks for Reading, keep learning !!!

--

--