CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
veeralakrishna

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: veeralakrishna/DataCamp-Project-Solutions-Python
Path: blob/master/Bad passwords and the NIST guidelines/notebook.ipynb
Views: 1229
Kernel: Python 3

1. The NIST Special Publication 800-63B

If you – 50 years ago – needed to come up with a secret password you were probably part of a secret espionage organization or (more likely) you were pretending to be a spy when playing as a kid. Today, many of us are forced to come up with new passwords all the time when signing into sites and apps. As a password inventeur it is your responsibility to come up with good, hard-to-crack passwords. But it is also in the interest of sites and apps to make sure that you use good passwords. The problem is that it's really hard to define what makes a good password. However, the National Institute of Standards and Technology (NIST) knows what the second best thing is: To make sure you're at least not using a bad password.

In this notebook, we will go through the rules in NIST Special Publication 800-63B which details what checks a verifier (what the NIST calls a second party responsible for storing and verifying passwords) should perform to make sure users don't pick bad passwords. We will go through the passwords of users from a fictional company and use python to flag the users with bad passwords. But us being able to do this already means the fictional company is breaking one of the rules of 800-63B:

Verifiers SHALL store memorized secrets in a form that is resistant to offline attacks. Memorized secrets SHALL be salted and hashed using a suitable one-way key derivation function.

That is, never save users' passwords in plaintext, always encrypt the passwords! Keeping this in mind for the next time we're building a password management system, let's load in the data.

Warning: The list of passwords and the fictional user database both contain real passwords leaked from real websites. These passwords have not been filtered in any way and include words that are explicit, derogatory and offensive.

# Importing the pandas module import pandas as pd # Loading in datasets/users.csv users = pd.read_csv('datasets/users.csv') # Printing out how many users we've got print(users) # Taking a look at the 12 first users users.head(12)
id user_name password 0 1 vance.jennings joobheco 1 2 consuelo.eaton 0869347314 2 3 mitchel.perkins fabypotter 3 4 odessa.vaughan aharney88 4 5 araceli.wilder acecdn3000 5 6 shawn.harrington 5278049 6 7 evelyn.gay master 7 8 noreen.hale murphy 8 9 gladys.ward lwsves2 9 10 brant.zimmerman 1190KAREN5572497 10 11 leanna.abbott aivlys24 11 12 milford.hubbard hubbard 12 13 mamie.fox mitonguito 13 14 jamie.cochran 310356 14 15 nathaniel.robinson angelmajo 15 16 lorrie.gay oZ4k0QE 16 17 domingo.dyer chelsea 17 18 martin.pacheco zvc1939 18 19 shelby.massey nickgd 19 20 rosella.barrett O2gv3LlcfG 20 21 karina.morton dada4943 21 22 leticia.sanford cocacola 22 23 jenny.woodard woodard 23 24 brandie.webster sentry31 24 25 sabrina.suarez OTEL3Q0D8y 25 26 dianna.munoz AJ9Da 26 27 julia.savage ewokzs 27 28 loretta.bass WvNV1aKyFEcPe 28 29 joaquin.walters YyGjz8E 29 30 rene.small toreze00 .. ... ... ... 952 953 christa.morrison mercedes 953 954 clarence.britt 28may1997 954 955 carmela.clayton N2XTArGRVhKl5 955 956 royce.combs Ct3EayTGuHs4Ic2 956 957 devon.holman raiders 957 958 becky.hickman AQiCWRGL 958 959 deena.holmes 9xQUdbKNhYsW 959 960 mark.chandler ye6491982 960 961 carmelo.byers asdfgh 961 962 mohammed.carpenter ujXSn2dZWhF 962 963 rico.valentine ap10172203 963 964 angel.jefferson 51183208 964 965 chrystal.burns DILWYN 965 966 irma.vasquez spider 966 967 taylor.kent summer 967 968 deloris.dixon seeks 968 969 julian.gross Passion! 969 970 joey.poole lagrimason 970 971 noel.montoya colours 971 972 josef.hoffman pharmacy2012 972 973 jorge.patrick 09196921342 973 974 rogelio.payne ilamujoy 974 975 lucille.stark buddaball 975 976 freeman.rose rangers 976 977 monica.flores broktaydrew16 977 978 autumn.alford akgkhk82 978 979 miriam.haynes jhavonne93 979 980 genaro.russo v2PfqcQDleA 980 981 lora.quinn antonau 981 982 elmer.mccormick goldfish92 [982 rows x 3 columns]

2. Passwords should not be too short

If we take a look at the first 12 users above we already see some bad passwords. But let's not get ahead of ourselves and start flagging passwords manually. What is the first thing we should check according to the NIST Special Publication 800-63B?

Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length.

Ok, so the passwords of our users shouldn't be too short. Let's start by checking that!

# Calculating the lengths of users' passwords import pandas as pd users = pd.read_csv('datasets/users.csv') users['length'] = users.password.str.len() users['too_short'] = users['length'] < 8 print(users['too_short'].sum()) # Taking a look at the 12 first rows users.head(12)
376

3. Common passwords people use

Already this simple rule flagged a couple of offenders among the first 12 users. Next up in Special Publication 800-63B is the rule that

verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised.

  • Passwords obtained from previous breach corpuses.
  • Dictionary words.
  • Repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).
  • Context-specific words, such as the name of the service, the username, and derivatives thereof.

We're going to check these in order and start with Passwords obtained from previous breach corpuses, that is, websites where hackers have leaked all the users' passwords. As many websites don't follow the NIST guidelines and encrypt passwords there now exist large lists of the most popular passwords. Let's start by loading in the 10,000 most common passwords which I've taken from here.

# Reading in the top 10000 passwords common_passwords = pd.read_csv("datasets/10_million_password_list_top_10000.txt", header=None, squeeze=True) # Taking a look at the top 20 common_passwords.head(20)
0 123456 1 password 2 12345678 3 qwerty 4 123456789 5 12345 6 1234 7 111111 8 1234567 9 dragon 10 123123 11 baseball 12 abc123 13 football 14 monkey 15 letmein 16 696969 17 shadow 18 master 19 666666 Name: 0, dtype: object

4. Passwords should not be common passwords

The list of passwords was ordered, with the most common passwords first, and so we shouldn't be surprised to see passwords like 123456 and qwerty above. As hackers also have access to this list of common passwords, it's important that none of our users use these passwords!

Let's flag all the passwords in our user database that are among the top 10,000 used passwords.

# Flagging the users with passwords that are common passwords users['common_password'] = users['password'].isin(common_passwords) # Counting and printing the number of users using common passwords print(users['common_password'].sum()) # Taking a look at the 12 first rows users.head(12)
129

5. Passwords should not be common words

Ay ay ay! It turns out many of our users use common passwords, and of the first 12 users there are already two. However, as most common passwords also tend to be short, they were already flagged as being too short. What is the next thing we should check?

Verifiers SHALL compare the prospective secrets against a list that contains [...] dictionary words.

This follows the same logic as before: It is easy for hackers to check users' passwords against common English words and therefore common English words make bad passwords. Let's check our users' passwords against the top 10,000 English words from Google's Trillion Word Corpus.

# Reading in a list of the 10000 most common words words = pd.read_csv("datasets/google-10000-english.txt", header=None, squeeze=True) # Flagging the users with passwords that are common words users['common_word'] = users['password'].str.lower().isin(words) # Counting and printing the number of users using common words as passwords print(users['common_word'].sum()) # Taking a look at the 12 first rows users.head(12)
137

6. Passwords should not be your name

It turns out many of our passwords were common English words too! Next up on the NIST list:

Verifiers SHALL compare the prospective secrets against a list that contains [...] context-specific words, such as the name of the service, the username, and derivatives thereof.

Ok, so there are many things we could check here. One thing to notice is that our users' usernames consist of their first names and last names separated by a dot. For now, let's just flag passwords that are the same as either a user's first or last name.

# Extracting first and last names into their own columns users['first_name'] = users['user_name'].str.extract(r'(^\w+)', expand=False) users['last_name'] = users['user_name'].str.extract(r'(\w+$)', expand=False) # Flagging the users with passwords that matches their names users['uses_name'] = (users['password'] == users['first_name']) | (users['password'] == users['last_name']) # Counting and printing the number of users using names as passwords print(users['uses_name'].count()) # Taking a look at the 12 first rows users.head(12)
982

7. Passwords should not be repetitive

Milford Hubbard (user number 12 above), what where you thinking!? Ok, so the last thing we are going to check is a bit tricky:

verifiers SHALL compare the prospective secrets [so that they don't contain] repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).

This is tricky to check because what is repetitive is hard to define. Is 11111 repetitive? Yes! Is 12345 repetitive? Well, kind of. Is 13579 repetitive? Maybe not..? To check for repetitiveness can be arbitrarily complex, but here we're only going to do something simple. We're going to flag all passwords that contain 4 or more repeated characters.

### Flagging the users with passwords with >= 4 repeats users['too_many_repeats'] = users['password'].str.contains(r'(.)\1\1\1') # Taking a look at the users with too many repeats users.head(12)

8. All together now!

Now we have implemented all the basic tests for bad passwords suggested by NIST Special Publication 800-63B! What's left is just to flag all bad passwords and maybe to send these users an e-mail that strongly suggests they change their password.

# Flagging all passwords that are bad users['bad_password'] = (users['too_short'])|(users['common_password'])|(users['common_word'])|(users['uses_name'])|(users['too_many_repeats']) # Counting and printing the number of bad passwords print(sum(users['bad_password'])) # Looking at the first 25 bad passwords users[users['bad_password']==True]['password'].head(25)
424
5 5278049 6 master 7 murphy 8 lwsves2 11 hubbard 13 310356 15 oZ4k0QE 16 chelsea 17 zvc1939 18 nickgd 21 cocacola 22 woodard 25 AJ9Da 26 ewokzs 28 YyGjz8E 30 reid 34 jOYZBs8 38 wwewwf1 43 225377 45 NdZ7E6 47 CQB3Z 48 diffo 51 123456789 52 y8uM7D6 56 mikeloo Name: password, dtype: object

9. Otherwise, the password should be up to the user

In this notebook, we've implemented the password checks recommended by the NIST Special Publication 800-63B. It's certainly possible to better implement these checks, for example, by using a longer list of common passwords. Also note that the NIST checks in no way guarantee that a chosen password is good, just that it's not obviously bad.

Apart from the checks we've implemented above the NIST is also clear with what password rules should not be imposed:

Verifiers SHOULD NOT impose other composition rules (e.g., requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets. Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically).

So the next time a website or app tells you to "include both a number, symbol and an upper and lower case character in your password" you should send them a copy of NIST Special Publication 800-63B.

# Enter a password that passes the NIST requirements # PLEASE DO NOT USE AN EXISTING PASSWORD HERE new_password = "test@2019"