This post demonstrates counting numpy.nan instances in a dataset. It borrows from the answer to the stack overflow question here.

import numpy as np
!python --version

Python 3.7.4


## Initialize a dataset

First, we’ll initialize a 2d array of 10000 by 10000 ones to play around with.

data = np.ones((10000,10000)); data, data.shape

(array([[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]]), (10000, 10000))


## Replace random selection of 100 indicies with numpy.nan

Next, we can take a random selection of 100 indicies using the numpy’s randint function.

random_rows = [np.random.randint(0,10000,100)]
random_columns = [np.random.randint(0,10000,100)]
random_rows, random_columns

([array([2489, 2232, 9101, 5141, 8159, 4862, 1258,  578, 3545, 6002, 2447,
2654, 8737, 4459, 7562, 7755,   30, 9217, 8159, 3206, 8070, 3869,
493,  554, 7340, 6127, 2894, 2194, 7688, 5580, 6272, 4443, 7499,
6647, 6391, 3980, 9350, 8889, 6701, 6119, 9408, 3741, 9822,  943,
3355, 6495, 4133, 3974,  767,  638, 3816, 6424, 3894, 2285, 4650,
1747, 4414,  934, 6903,   72, 7336, 6886, 8757, 3455, 2987, 1857,
8539, 4877, 7290, 2168, 8699, 3784, 1050,  419, 6522,  331, 6852,
6707, 5405, 6416,  804, 6580, 2666, 8495, 9113, 9860, 6967, 7874,
7953, 8535, 8132,  703, 8393, 5499,  882, 6343, 7166,  773, 7869,
4849])],
[array([1449, 8017, 5184, 2633, 7042, 7816, 4290, 5996,  428,  729, 9694,
4407, 7413, 1387, 5740, 9173, 1576, 1562, 5955,  221, 7362, 7812,
725,  721,  475,  446, 4753, 1752, 2657, 7106, 8727, 7783, 2447,
6598,  849,  528, 2812, 1062, 7311, 1908, 9881, 1644, 7622, 5661,
2994, 6229, 9411, 9725,  453, 3844, 6221, 7172, 6114, 1270, 8570,
514, 3096, 1782, 6512, 7163, 2003, 1463, 8042, 4274,   25, 2756,
3827, 3400, 7097, 2116, 7922, 7810, 2001, 2310, 1143,   99,  755,
9611, 7654, 7215, 1320, 8924, 3520, 2513, 3994, 8836, 3458, 6736,
8653, 6721, 2790, 6165, 1782, 2814, 1164, 5302, 3506, 9960, 2816,
2159])])


After we have some random indicies, populating the data with np.nan is as simple as setting it.

data[random_rows, random_columns] = np.nan


## Replace randomly selected indices with np.nan

In order to count the number of nan instances in the dataset, we can call np.isnan to return a mask of true / false depending on whether the data is nan. Then we can use the np.count_nonzero function to sum up the total.

np.count_nonzero(np.isnan(data))

100


Alternatively, if we inverse the true /false mask, we can count the instances that are not nan.

np.count_nonzero(~np.isnan(data))

99999900