How do I count numpy nans?
This post demonstrates counting numpy.nan instances in a dataset. It borrows from the answer to the stack overflow question here.
import numpy as np
!python --version
Python 3.7.4
Initialize a dataset
First, we’ll initialize a 2d array of 10000 by 10000 ones to play around with.
data = np.ones((10000,10000)); data, data.shape
(array([[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]]), (10000, 10000))
Replace random selection of 100 indicies with numpy.nan
Next, we can take a random selection of 100 indicies using the numpy’s randint
function.
random_rows = [np.random.randint(0,10000,100)]
random_columns = [np.random.randint(0,10000,100)]
random_rows, random_columns
([array([2489, 2232, 9101, 5141, 8159, 4862, 1258, 578, 3545, 6002, 2447,
2654, 8737, 4459, 7562, 7755, 30, 9217, 8159, 3206, 8070, 3869,
493, 554, 7340, 6127, 2894, 2194, 7688, 5580, 6272, 4443, 7499,
6647, 6391, 3980, 9350, 8889, 6701, 6119, 9408, 3741, 9822, 943,
3355, 6495, 4133, 3974, 767, 638, 3816, 6424, 3894, 2285, 4650,
1747, 4414, 934, 6903, 72, 7336, 6886, 8757, 3455, 2987, 1857,
8539, 4877, 7290, 2168, 8699, 3784, 1050, 419, 6522, 331, 6852,
6707, 5405, 6416, 804, 6580, 2666, 8495, 9113, 9860, 6967, 7874,
7953, 8535, 8132, 703, 8393, 5499, 882, 6343, 7166, 773, 7869,
4849])],
[array([1449, 8017, 5184, 2633, 7042, 7816, 4290, 5996, 428, 729, 9694,
4407, 7413, 1387, 5740, 9173, 1576, 1562, 5955, 221, 7362, 7812,
725, 721, 475, 446, 4753, 1752, 2657, 7106, 8727, 7783, 2447,
6598, 849, 528, 2812, 1062, 7311, 1908, 9881, 1644, 7622, 5661,
2994, 6229, 9411, 9725, 453, 3844, 6221, 7172, 6114, 1270, 8570,
514, 3096, 1782, 6512, 7163, 2003, 1463, 8042, 4274, 25, 2756,
3827, 3400, 7097, 2116, 7922, 7810, 2001, 2310, 1143, 99, 755,
9611, 7654, 7215, 1320, 8924, 3520, 2513, 3994, 8836, 3458, 6736,
8653, 6721, 2790, 6165, 1782, 2814, 1164, 5302, 3506, 9960, 2816,
2159])])
After we have some random indicies, populating the data with np.nan is as simple as setting it.
data[random_rows, random_columns] = np.nan
Replace randomly selected indices with np.nan
In order to count the number of nan instances in the dataset, we can call np.isnan
to return a mask of true / false depending on whether the data is nan. Then we can use the np.count_nonzero
function to sum up the total.
np.count_nonzero(np.isnan(data))
100
Alternatively, if we inverse the true /false mask, we can count the instances that are not nan.
np.count_nonzero(~np.isnan(data))
99999900
Archive
chinese
tang-dynasty-poetry
李白
python
王维
rl
pytorch
numpy
emacs
杜牧
spinningup
networking
deep-learning
贺知章
白居易
王昌龄
杜甫
李商隐
tips
reinforcement-learning
macports
jekyll
骆宾王
贾岛
孟浩然
xcode
time-series
terminal
regression
rails
productivity
pandas
math
macosx
lesson-plan
helicopters
flying
fastai
conceptual-learning
command-line
bro
黄巢
韦应物
陈子昂
王翰
王之涣
柳宗元
杜秋娘
李绅
张继
孟郊
刘禹锡
元稹
youtube
visdom
system
sungho
stylelint
stripe
softmax
siri
sgd
scipy
scikit-learn
scikit
safari
research
qtran
qoe
qmix
pyhton
poetry
pedagogy
papers
paper-review
optimization
openssl
openmpi
nyc
node
neural-net
multiprocessing
mpi
morl
ml
mdp
marl
mandarin
macos
machine-learning
latex
language-learning
khan-academy
jupyter-notebooks
ios-programming
intuition
homebrew
hacking
google-cloud
github
flashcards
faker
docker
dme
deepmind
dec-pomdp
data-wrangling
craftsman
congestion-control
coding
books
book-review
atari
anki
analogy
3brown1blue
2fa