import pandas as pd
import numpy as np
(1) Overview
A series of longitudinal studies in the Journal of the American Medical Association (JAMA) , most recently Reuben et al. (2019 ) direct link , have found consistent and robust associations between childhood exposure to lead and adult psychopathology:
In this multidecade, longitudinal study of lead-exposed children, higher childhood blood lead level was associated with greater psychopathology across the life course and difficult adult personality traits. Childhood lead exposure may have long-term consequences for adult mental health and personality. (Reuben et al. 2019 )
Though we won’t explore that association directly here, I wanted to gather two datasets that would be relevant for studying it, to demonstrate some basic data-cleaning steps using Python libraries like pandas
and openpyxl
.
If you’re interested in applying your DSAN 5000 skills to the healthcare sphere, definitely consider studying this issue! I have a close friend, for example, who teaches at a school in East Oakland which is currently facing a lead contamination crisis . As a third dataset on top of the two we clean here, for example, you could try finding data on income by neighborhood—there is a reason why lead contamination is continually “discovered” at schools in neighborhoods like East Oakland and West Baltimore , and not at schools in neighborhoods like Georgetown.
(2) Navigating The .csv
Seas with glob
First, since the .csv
filenames are super long, let’s just use one of my favorite Python libraries, glob
, to automatically get a list of all the .csv
files within the same folder as this notebook, so we don’t even have to manually type the name(s) of the data files in our code!
import glob
glob.glob("data/*" )
['data/SHIP_Emergency_Department_Visits_Related_To_Mental_Health_Conditions_2008-2017.csv',
'data/fakedata.dat',
'data/MD_CountyLevelSummary_2017.xlsx']
csv_fpaths = glob.glob("data/*.csv" )
xlsx_fpaths = glob.glob("data/*.xlsx" )
data_fpaths = csv_fpaths + xlsx_fpaths
# for fpath in all_data_fpaths:
# print(fpath)
for fpath_index, fpath in enumerate (data_fpaths):
print (fpath_index, fpath)
0 data/SHIP_Emergency_Department_Visits_Related_To_Mental_Health_Conditions_2008-2017.csv
1 data/MD_CountyLevelSummary_2017.xlsx
'data/SHIP_Emergency_Department_Visits_Related_To_Mental_Health_Conditions_2008-2017.csv'
SHIP stands (weirdly) for State Health Insurance Assistance Program, a program in the state of Maryland which subsidizes health insurance payments for low-income healthcare recipients. You can find the full info and codebook for the SHIP data here
Here we also see a third way to use glob()
: somewhat like tab completion in the Linux shell, we can type the beginning of the filename followed by the wildcard character *
to obtain the full filename we need. Note that glob.glob()
always returns a list
, so we need to add [0]
at the end to access the first result (even if there is only one result in total!).
ship_fpath = glob.glob("data/SHIP*.csv" )[0 ]
ship_df = pd.read_csv(ship_fpath)
ship_df.head()
0
State
4291.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
1
Allegany
3309.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
2
Anne Arundel
5734.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
3
Baltimore City
10093.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
4
Baltimore County
4210.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
(4) Lead Levels Data from the CDC
Especially given
Requirement already satisfied: openpyxl in /Users/jpj/.pyenv/versions/3.11.5/lib/python3.11/site-packages (3.1.2)
Requirement already satisfied: et-xmlfile in /Users/jpj/.pyenv/versions/3.11.5/lib/python3.11/site-packages (from openpyxl) (1.1.0)
Note: you may need to restart the kernel to use updated packages.
And data on childhood lead levels can be found from the CDC website’s data portal . The most recent sample was conducted in 2021, the results of which are contained in the Excel-format dataset we load here.
xlsx_fpath = glob.glob("data/*.xlsx" )[0 ]
places_df = pd.read_excel(xlsx_fpath, skiprows= 3 )
places_df.rename(columns= {'Unnamed: 0' : 'fips' , 'Unnamed: 1' : 'county' }, inplace= True )
places_df = places_df.rename(columns= {'Unnamed: 0' : 'fips' , 'Unnamed: 1' : 'county' })
rename_map = {
'Unnamed: 0' : 'fips' ,
'Unnamed: 1' : 'county' ,
#'Unnamed: 2': 'child_pop',
'Percent' : 'pct_lead'
}
places_df.rename(columns= rename_map, inplace= True )
cols_to_keep = list (rename_map.values())
places_df = places_df[cols_to_keep]
0
001
Allegany County
0.020888
1
003
Anne Arundel County
0.003220
2
005
Baltimore County
0.009980
3
009
Calvert County
0.006550
4
011
Caroline County
0.019973
5
013
Carroll County
0.006377
6
015
Cecil County
0.006893
7
017
Charles County
0.002288
8
019
Dorchester County
0.027439
9
021
Frederick County
0.003054
10
023
Garrett County
NaN
11
025
Harford County
0.003109
12
027
Howard County
0.008291
13
029
Kent County
NaN
14
031
Montgomery County
0.003826
15
033
Prince George's County
0.012376
16
035
Queen Anne's County
NaN
17
037
St. Mary's County
NaN
18
039
Somerset County
0.015837
19
041
Talbot County
0.012442
20
043
Washington County
0.008881
21
045
Wicomico County
0.013187
22
047
Worcester County
0.012959
23
510
Baltimore (city) County
0.041661
24
NaN
Unknown
NaN
25
NaN
NaN
NaN
26
Notes: 'N/A' indicates data are supressed when...
NaN
NaN
27
Population estimates calculated as population ...
NaN
NaN
28
Data received and processed by CDC as of April...
NaN
NaN
First, we note that only the rows from index 0 to index 23 contain the data we want (the rest contain footnotes added to the cells below the main dataset). So, we use the .iloc
accessor from Pandas to “slice” just these first 24 rows, keeping in mind that Python’s slice operator :
is inclusive-exclusive , meaning that:
The number before the :
will be the first index included, but
The number after the :
will be one greater than the last index included
places_df = places_df.iloc[0 :24 ].copy()
0
001
Allegany County
0.020888
1
003
Anne Arundel County
0.003220
2
005
Baltimore County
0.009980
3
009
Calvert County
0.006550
4
011
Caroline County
0.019973
5
013
Carroll County
0.006377
6
015
Cecil County
0.006893
7
017
Charles County
0.002288
8
019
Dorchester County
0.027439
9
021
Frederick County
0.003054
10
023
Garrett County
NaN
11
025
Harford County
0.003109
12
027
Howard County
0.008291
13
029
Kent County
NaN
14
031
Montgomery County
0.003826
15
033
Prince George's County
0.012376
16
035
Queen Anne's County
NaN
17
037
St. Mary's County
NaN
18
039
Somerset County
0.015837
19
041
Talbot County
0.012442
20
043
Washington County
0.008881
21
045
Wicomico County
0.013187
22
047
Worcester County
0.012959
23
510
Baltimore (city) County
0.041661
places_df['county' ] = places_df['county' ].str .replace(" County" ,"" )
0
001
Allegany
0.020888
1
003
Anne Arundel
0.003220
2
005
Baltimore
0.009980
3
009
Calvert
0.006550
4
011
Caroline
0.019973
5
013
Carroll
0.006377
6
015
Cecil
0.006893
7
017
Charles
0.002288
8
019
Dorchester
0.027439
9
021
Frederick
0.003054
10
023
Garrett
NaN
11
025
Harford
0.003109
12
027
Howard
0.008291
13
029
Kent
NaN
14
031
Montgomery
0.003826
15
033
Prince George's
0.012376
16
035
Queen Anne's
NaN
17
037
St. Mary's
NaN
18
039
Somerset
0.015837
19
041
Talbot
0.012442
20
043
Washington
0.008881
21
045
Wicomico
0.013187
22
047
Worcester
0.012959
23
510
Baltimore (city)
0.041661
places_df['county' ] = places_df['county' ].str .replace('(city)' , 'City' , regex= False )
0
001
Allegany
0.020888
1
003
Anne Arundel
0.003220
2
005
Baltimore
0.009980
3
009
Calvert
0.006550
4
011
Caroline
0.019973
5
013
Carroll
0.006377
6
015
Cecil
0.006893
7
017
Charles
0.002288
8
019
Dorchester
0.027439
9
021
Frederick
0.003054
10
023
Garrett
NaN
11
025
Harford
0.003109
12
027
Howard
0.008291
13
029
Kent
NaN
14
031
Montgomery
0.003826
15
033
Prince George's
0.012376
16
035
Queen Anne's
NaN
17
037
St. Mary's
NaN
18
039
Somerset
0.015837
19
041
Talbot
0.012442
20
043
Washington
0.008881
21
045
Wicomico
0.013187
22
047
Worcester
0.012959
23
510
Baltimore City
0.041661
0 True
1 True
2 True
3 True
4 True
...
1245 False
1246 False
1247 False
1248 False
1249 False
Name: Year, Length: 1250, dtype: bool
ship_df = ship_df.loc[ship_df['Year' ] == 2017 ]
ship_df = ship_df[ship_df['Jurisdiction' ] != "State" ]
ship_df = ship_df[ship_df['Race/ ethnicity' ] == "All races/ ethnicities (aggregated)" ]
ship_df
1
Allegany
3309.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
2
Anne Arundel
5734.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
3
Baltimore City
10093.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
4
Baltimore County
4210.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
5
Calvert
2999.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
6
Caroline
7556.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
7
Carroll
4216.0
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
8
Cecil
9584.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
9
Charles
2817.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
10
Dorchester
11251.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
11
Frederick
3064.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
12
Garrett
7967.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
13
Harford
3020.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
14
Howard
3082.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
15
Kent
13662.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
16
Montgomery
2312.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
17
Prince George's
1955.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
18
Queen Anne's
6119.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
19
Saint Mary's
6173.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
20
Somerset
2696.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
21
Talbot
7661.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
22
Washington
5410.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
23
Wicomico
2897.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
24
Worcester
3502.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
1
Allegany
3309.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
2
Anne Arundel
5734.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
3
Baltimore City
10093.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
4
Baltimore County
4210.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
5
Calvert
2999.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
6
Caroline
7556.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
7
Carroll
4216.0
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
8
Cecil
9584.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
9
Charles
2817.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
10
Dorchester
11251.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
11
Frederick
3064.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
12
Garrett
7967.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
13
Harford
3020.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
14
Howard
3082.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
15
Kent
13662.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
16
Montgomery
2312.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
17
Prince George's
1955.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
18
Queen Anne's
6119.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
19
Saint Mary's
6173.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
20
Somerset
2696.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
21
Talbot
7661.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
22
Washington
5410.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
23
Wicomico
2897.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
24
Worcester
3502.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
ship_df['Jurisdiction' ] = ship_df['Jurisdiction' ].str .replace("Baltimore County" , "Baltimore" )
1
Allegany
3309.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
2
Anne Arundel
5734.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
3
Baltimore City
10093.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
4
Baltimore
4210.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
5
Calvert
2999.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
6
Caroline
7556.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
7
Carroll
4216.0
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
8
Cecil
9584.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
9
Charles
2817.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
10
Dorchester
11251.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
11
Frederick
3064.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
12
Garrett
7967.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
13
Harford
3020.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
14
Howard
3082.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
15
Kent
13662.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
16
Montgomery
2312.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
17
Prince George's
1955.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
18
Queen Anne's
6119.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
19
Saint Mary's
6173.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
20
Somerset
2696.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
21
Talbot
7661.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
22
Washington
5410.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
23
Wicomico
2897.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
24
Worcester
3502.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
ship_df['Jurisdiction' ] = ship_df['Jurisdiction' ].str .replace("Saint" , "St." )
1
Allegany
3309.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
2
Anne Arundel
5734.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
3
Baltimore City
10093.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
4
Baltimore
4210.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
5
Calvert
2999.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
6
Caroline
7556.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
7
Carroll
4216.0
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
8
Cecil
9584.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
9
Charles
2817.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
10
Dorchester
11251.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
11
Frederick
3064.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
12
Garrett
7967.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
13
Harford
3020.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
14
Howard
3082.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
15
Kent
13662.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
16
Montgomery
2312.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
17
Prince George's
1955.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
18
Queen Anne's
6119.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
19
St. Mary's
6173.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
20
Somerset
2696.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
21
Talbot
7661.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
22
Washington
5410.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
23
Wicomico
2897.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
24
Worcester
3502.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
merged_df = places_df.merge(ship_df, left_on= 'county' , right_on= 'Jurisdiction' , how= 'left' , indicator= False )
0
001
Allegany
0.020888
Allegany
3309.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
1
003
Anne Arundel
0.003220
Anne Arundel
5734.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
2
005
Baltimore
0.009980
Baltimore
4210.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
3
009
Calvert
0.006550
Calvert
2999.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
4
011
Caroline
0.019973
Caroline
7556.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
5
013
Carroll
0.006377
Carroll
4216.0
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
6
015
Cecil
0.006893
Cecil
9584.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
7
017
Charles
0.002288
Charles
2817.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
8
019
Dorchester
0.027439
Dorchester
11251.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
9
021
Frederick
0.003054
Frederick
3064.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
10
023
Garrett
NaN
Garrett
7967.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
11
025
Harford
0.003109
Harford
3020.2
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
12
027
Howard
0.008291
Howard
3082.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
13
029
Kent
NaN
Kent
13662.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
14
031
Montgomery
0.003826
Montgomery
2312.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
15
033
Prince George's
0.012376
Prince George's
1955.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
16
035
Queen Anne's
NaN
Queen Anne's
6119.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
17
037
St. Mary's
NaN
St. Mary's
6173.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
18
039
Somerset
0.015837
Somerset
2696.1
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
19
041
Talbot
0.012442
Talbot
7661.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
20
043
Washington
0.008881
Washington
5410.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
21
045
Wicomico
0.013187
Wicomico
2897.6
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
22
047
Worcester
0.012959
Worcester
3502.8
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
23
510
Baltimore City
0.041661
Baltimore City
10093.5
All races/ ethnicities (aggregated)
2017
Mental Health ED visits
merged_df[['pct_lead' ,'Value' ]].corr()
pct_lead
1.00000
0.59498
Value
0.59498
1.00000
Reuben, Aaron, Jonathan D. Schaefer, Terrie E. Moffitt, Jonathan Broadbent, Honalee Harrington, Renate M. Houts, Sandhya Ramrakha, Richie Poulton, and Avshalom Caspi. 2019.
“Association of Childhood Lead Exposure With Adult Personality Traits and Lifelong Mental Health .” JAMA Psychiatry 76 (4): 418–25.
https://doi.org/10.1001/jamapsychiatry.2018.4192 .