1,721,189 research outputs found
2020 General Election Voting by US Census Block Group
PROBLEM AND OPPORTUNITY
In the United States, voting is largely a private matter. A registered voter is given a randomized ballot form or machine to prevent linkage between their voting choices and their identity. This disconnect supports confidence in the election process, but it provides obstacles to an election's analysis. A common solution is to field exit polls, interviewing voters immediately after leaving their polling location. This method is rife with bias, however, and functionally limited in direct demographics data collected.
For the 2020 general election, though, most states published their election results for each voting location. These publications were additionally supported by the geographical areas assigned to each location, the voting precincts. As a result, geographic processing can now be applied to project precinct election results onto Census block groups. While precinct have few demographic traits directly, their geographies have characteristics that make them projectable onto U.S. Census geographies. Both state voting precincts and U.S. Census block groups:
are exclusive, and do not overlap
are adjacent, fully covering their corresponding state and potentially county
have roughly the same size in area, population and voter presence
Analytically, a projection of local demographics does not allow conclusions about voters themselves. However, the dataset does allow statements related to the geographies that yield voting behavior. One could say, for example, that an area dominated by a particular voting pattern would have mean traits of age, race, income or household structure.
The dataset that results from this programming provides voting results allocated by Census block groups. The block group identifier can be joined to Census Decennial and American Community Survey demographic estimates.
DATA SOURCES
The state election results and geographies have been compiled by Voting and Election Science team on Harvard's dataverse. State voting precincts lie within state and county boundaries.
The Census Bureau, on the other hand, publishes its estimates across a variety of geographic definitions including a hierarchy of states, counties, census tracts and block groups. Their definitions can be found here. The geometric shapefiles for each block group are available here.
The lowest level of this geography changes often and can obsolesce before the next census survey (Decennial or American Community Survey programs). The second to lowest census level, block groups, have the benefit of both granularity and stability however. The 2020 Decennial survey details US demographics into 217,740 block groups with between a few hundred and a few thousand people.
Dataset Structure
The dataset's columns include:
Column Definition
BLOCKGROUP_GEOID 12 digit primary key. Census GEOID of the block group row. This code concatenates:
2 digit state
3 digit county within state
6 digit Census Tract identifier
1 digit Census Block Group identifier within tract
STATE State abbreviation, redundent with 2 digit state FIPS code above
REP Votes for Republican party candidate for president
DEM Votes for Democratic party candidate for president
LIB Votes for Libertarian party candidate for president
OTH Votes for presidential candidates other than Republican, Democratic or Libertarian
AREA square kilometers of area associated with this block group
GAP total area of the block group, net of area attributed to voting precincts
PRECINCTS Number of voting precincts that intersect this block group
ASSUMPTIONS, NOTES AND CONCERNS:
Votes are attributed based upon the proportion of the precinct's area that intersects the corresponding block group. Alternative methods are left to the analyst's initiative.
50 states and the District of Columbia are in scope as those U.S. possessions voting in the general election for the U.S. Presidency.
Three states did not report their results at the precinct level: South Dakota, Kentucky and West Virginia. A dummy block group is added for each of these states to maintain national totals. These states represent 2.1% of all votes cast.
Counties are commonly coded using FIPS codes. However, each election result file may have the county field named differently. Also, three states do not share county definitions - Delaware, Massachusetts, Alaska and the District of Columbia.
Block groups may be used to capture geographies that do not have population like bodies of water. As a result, block groups without intersection voting precincts are not uncommon.
In the U.S., elections are administered at a state level with the Federal Elections Commission compiling state totals against the Electoral College weights. The states have liberty, though, to define and change their own voting precincts https://en.wikipedia.org/wiki/Electoral_precinct.
The Census Bureau practices "data suppression", filtering some block groups from demographic publication because they do not meet a population threshold. This practice is done to maintain statistical reliability in the estimates and to prevent accidental disclosure of individual respondents. As a result,
the shape files for state block groups may have additional block groups not available in demographic estimates.
ignoring the suppressed block groups will cause statistical bias for these smallest geographies
As written, this projection takes more than 6 days to complete on a familiar Intel-64 based laptop. Its performance would benefit from:
Running states in parallel rather than serially
Looking for intersecting precincts within the shared county rather than state level
Allocation details causes challenges in efforts to tie totals to state and national summaries. By allocating each of 233,866 detailed block groups based on area, many double precision proportions area applied to original integer counts. The allocations themselves then are not integers and may not sum exactly to the reported state election counts.
RECOGNITION
Special thanks to the meticulous efforts of:
The Voting and Election Science Team (University of Florida, Wichita State University) (https://dataverse.harvard.edu/dataverse/electionscience)
@data{DVN/K7760H_2020, author = {Voting and Election Science Team}, publisher = {Harvard Dataverse}, title = {{2020 Precinct-Level Election Results}}, year = {2020}, version = {V29}, doi = {10.7910/DVN/K7760H}, url = {https://doi.org/10.7910/DVN/K7760H} }
MIT's Election Data and Science Lab MEDSL (https://dataverse.harvard.edu/dataverse/medsl
“U.S. Census TIGER/Line Files for Block Groups 2021.” Index of /Geo/Tiger/TIGER2021/BG, 22 Sept. 2021, https://www2.census.gov/geo/tiger/TIGER2021/BG/.
LICENSE
This code is available subject to the MIT Open Source License
SUMMARY STATISTICS
State Republican Democrat Libertarian Other Precincts Block Groups
AL 1,441,170 849,624 25,176 7,312 1,972 3,925
AK 189,951 153,778 8,897 4,943 441 504
AZ 1,661,686 1,672,143 51,465 0 1,489 4,773
AR 760,647 423,932 13,133 21,357 2,591 2,294
CA 6,006,428 11,110,493 187,907 192,232 20,799 25,607
CO 1,364,607 1,804,352 52,460 35,561 3,215 4,058
CT 714,717 1,080,831 20,230 8,079 741 2,716
DE 200,603 296,268 5,000 2,139 434 706
DC 18,586 317,323 2,036 6,411 144 571
FL 5,668,731 5,297,045 70,324 54,769 6,010 13,388
GA 2,461,837 2,474,507 62,138 0 2,679 7,446
HI 196,864 366,130 5,539 5,936 262 1,083
ID 554,119 287,021 16,404 9,737 935 1,284
IL 2,446,891 3,471,915 66,544 48,088 10,083 9,898
IN 1,729,857 1,242,498 58,901 0 5,166 5,290
IA 897,672 759,061 19,637 14,501 1,661 2,703
KS 771,406 570,323 30,574 0 4,070 2,461
LA 1,255,776 856,034 21,645 14,607 3,753 4,294
ME 360,767 435,070 14,120 9,412 573 1,184
MD 976,414 1,985,023 33,488 42,106 2,043 4,079
MA 1,167,202 2,382,202 47,013 34,985 2,173 5,116
MI 2,649,859 2,804,036 60,406 23,907 4,756 8,386
MN 1,484,065 1,717,077 34,976 41,053 4,110 4,706
MS 756,764 539,398 8,026 9,571 1,764 2,445
MO 1,718,736 1,253,014 41,205 12,202 3,733 5,031
MT 343,602 244,786 15,252 0 666 900
NE 556,846 374,583 20,283 0 1,386 1,648
NV 669,890 703,486 14,783 17,217 2,094 1,963
NH 365,660 424,937 13,236 0 321 997
NJ 1,883,274 2,608,335 31,677 26,067 726 6,599
NM 401,894 501,614 12,585 7,872 1,917 1,614
NY 3,251,997 5,244,886 60,383 74,987 15,376 16,070
NC 2,758,775 2,684,292 48,678 33,059 2,662 7,111
ND 235,751 115,042 9,371 1,860 422 632
OH 3,154,834 2,679,165 67,569 18,812 8,941 9,472
OK 1,020,280 503,890 24,731 11,798 1,948 3,374
OR 958,448 1,340,383 41,582 33,908 1,331 2,970
PA 3,378,442 3,460,475 79,432 0 9,150 10,173
RI 199,922 307,486 5,053 5,296 423 792
SC 1,385,103 1,091,541 27,916 8,769 2,263 3,408
TN 1,852,475 1,143,711 29,877 26,926 1,962 4,562
TX 5,890,570 5,259,281 126,269 44,299 9,014 18,638
UT 865,140 560,282 38,447 42,773 2,424 2,020
VT 112,704 242,820 3,608 8,296 284 552
VA 1,962,430 2,413,568 64,761 19,765 2,477 5,963
WA 1,584,651 2,369,612 80,500 52,868 7,464 5,311
WI 1,610,184 1,630,866 38,491 18,500 7,090 4,692
WY 193,559 73,491 5,768 3,947 481 457
Total 72,091,786 80,127,630 1,817,496 1,055,927.00 166,419 233,86
2022 Social Vulnerability by US Census Block Group
blockgroupvulnerability
OPPORTUNITY
The US Centers for Disease Control (CDC) publishes a set of percentiles that compare US geographies by vulnerability across household, socioeconomic, racial/ethnic and housing themes. These Social Vulnerability Indexes (SVI) were originally intended to to help public health officials and emergency response planners identify communities that will need support around an event. They are generally valuable for any public interest that wants to relate themselves to needy communities by geography.
The SVI publication and its basis variables are provided at the Census tract level of geographic detail. The Census' American Community Survey is available down the to the block group level, however. Recasting the SVI methods at this lower level of geography allows it to be tied to thousands of other demographic variables available.
Because the SVI relies on ACS variables only available at the tract level, a projection model needs to applied to approximate its results using blockgroup level ACS variables. The blockgroupvulnerability dataset casts a prediction for the CDCs logic for a new contribution to the Open Environments blockgroup series available on Harvard's dataverse platform.
DATA
The CDC's annual SVI publication starts with 23 simple derivations using 50 ACS Census variables. Next the SVI process ranks census geographies to calculate a rank for each, where Percentile Rank = (Rank-1) / (N-1). The SVI themes are then calculated at the tract level as a percentile rank of a sum of the percentile ranks of the first level ACS derived variables. Finally, the overall ranking is taken as the sum of the theme percentile rankings.
The SVI data publication
is keyed by geography (7 cols) where ultimately the Census Tract FIPS code is 2 State + 3 County + 4 Tract + 2 Tract Decimals eg, 56043000301 is 56 Wyoming, 043 Washakie County, Tract 3.01
republishes Census demographics called 'adjunct variables' including area, population, households and housing units from the ACS daytime population taken from LandScan 2020 estimates
derives 23 SVI variables from 50 ACS 5 Year variables
with each having an estimate (E_), estimate precentage (EP_), margin of error (M_), margin percentage (MP_) and flag variable (F_) for those greater than 90% or less than 10%
provides the final 4 themes and a composite SVI percentile
annually vars = ['ST', 'STATE', 'ST_ABBR', 'STCNTY', 'COUNTY', 'FIPS', 'LOCATION'] +\ ['SNGPNT','LIMENG','DISABL','AGE65','AGE17','NOVEH','MUNIT','MOBILE','GROUPQ','CROWD','UNINSUR','UNEMP','POV150','NOHSDP','HBURD','TWOMORE','OTHERRACE','NHPI','MINRTY','HISP','ASIAN','AIAN','AFAM','NOINT'] +\ ['TOTAL','THEME1','THEME2','THEME3','THEME4'] + \ ['AREA_SQMI', 'TOTPOP', 'DAYPOP', 'HU', 'HH'] knowns = vars + \ # Estimates, the result of calc against ACS vars [('E_'+v) for v in vars] + \ # Flag 0,1 whether this geog is in 90 percentile rank (its vulnerable) [('F_'+v) for v in vars] +\ # Margine of error for ACS calcs [('M_'+v) for v in vars] + \ # Margine of error for ACS calcs, as percentage [('MP_'+v) for v in vars] +\ # Estimates of ACS calcs, as percentage [('EP_'+v) for v in vars] + \ # Estimated percentile ranks [('EPL_'+v) for v in vars] + \ # Sum across var percentile ranks [('SPL_'+v) for v in vars]+ \ # Percentile rank of the sum of percentile ranks [('RPL_'+v) for v in vars] [c for c in svitract.columns if c not in knowns]
The SVI themes range over [0,1] but the CDC uses -999 as an NA value; this is set for ~800 or 1% of tracts which have no total poulation.
The themes are numbered: Socioeconomic Status – RPL_THEME1 Household Characteristics – RPL_THEME2 Racial & Ethnic Minority Status – RPL_THEME3 Housing Type & Transportation – RPL_THEME4
The themes with their variables and ACS sources are as follows:
Unlike Census data, the CDC ranks Puerto Rico and Tribal tracts separately from the US otherwise.
Theme SVI Variable ACS Table ACS Variables
Socioeconomic E_UNINSUR S2701 S2701_C04_001E
Socioeconomic E_UNEMP DP03 DP03_0005E
Socioeconomic E_POV150 S1701 S1701_C01_040E
Socioeconomic E_NOHSDP B06009 B06009_002E
Socioeconomic E_HBURD S2503 S2503_C01_028E + S2503_C01_032E + S2503_C01_036E + S2503_C01_040E
Household E_SNGPNT B11012 B11012_010E + B11012_015E
Household E_LIMENG B16005 B16005_007E + B16005_008E + B16005_012E + B16005_013E + B16005_017E + B16005_018E + B16005_022E + B16005_023E + B16005_029E + B16005_030E + B16005_034E + B16005_035E + B16005_039E + B16005_040E + B16005_044E + B16005_045E
Household E_DISABL DP02 DP02_0072E
Household E_AGE65 S0101 S0101_C01_030E
Household E_AGE17 B09001 B09001_001E
Racial & Ethnic E_TWOMORE DP05 DP05_0083E
Racial & Ethnic E_OTHERRACE DP05 DP05_0082E
Racial & Ethnic E_NHPI DP05 DP05_0081E
Racial & Ethnic E_MINRTY DP05 DP05_0071E + DP05_0078E + DP05_0079E + DP05_0080E + DP05_0081E + DP05_0082E + DP05_0083E
Racial & Ethnic E_HISP DP05 DP05_0071E
Racial & Ethnic E_ASIAN DP05 DP05_0080E
Racial & Ethnic E_AIAN DP05 DP05_0079E
Racial & Ethnic E_AFAM DP05 DP05_0078E
Housing E_NOVEH DP04 DP04_0058E
Housing E_MUNIT DP04 DP04_0012E + DP04_0013E
Housing E_MOBILE DP04 DP04_0014E
Housing E_GROUPQ B26001 B26001_001E
Housing E_CROWD DP04 DP04_0078E + DP04_0079E
The Census American Community Survey is updated annually and accessible by API. For this effort, variables used commonly at the block group level were retrieved at the tract level so that a predictive method could be applied to detail. The specific variables used are shown as lists in the data retrieval functions below. The Census' TIGER\Line publication provides the geographic shapes and properties. The TIGER\Line dataset includes:
Geography, position
['STATEFP', 'COUNTYFP', 'TRACTCE', 'GEOID', 'INTPTLAT', 'INTPTLON']
Name with legal/statistical area description
'[NAME', 'NAMELSAD', 'MTFCC', 'FUNCSTAT']
Area of land and water in square meters
['ALAND', 'AWATER']
Geographic shape
['geometry']
See https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2020/TGRSHP2020_TechDoc.pdf
The supporting code is maintained on https://github.com/OpenEnvironments/blockgroupvulnerability In generally, variable names within the process are taken from the original SVI and ACS documentation. The variable names in the dataverse publication have the E_ prefix removed, maintaining the published variables relation to the SVI original.
MODEL
The models that generates this data publication uses block group level ACS variables aggregated by the Census to the tract level. The Census TIGER\Line data adds a variable, the land area of each geography, to calculate population density.
For context, there are about 85K tracts in the United States, while there are about 200K block groups. Each tract has between 1,200 and 8,000 people in it while each block group has between 600 and 3,000. Block groups are subdivisions of Census tracts. This level of detail is available for most of the SVI's Census sources, except for variables in the ACS Data Profiles and Subject Tables. These are only available at the tract level.
A model is trained, for each of the SVI's four themes as well as its composite. Each is a regressor, converted to its own percentile rank, and applied at a block group level version of the ACS and TIGER\Line features. The models performance compares the original targets to the block group estimates, aggregated by mean for each tract.
The root mean squared error (RMSE) for each theme are:
|Theme|RMSE| |---------------| |THEME1|0.148565| |THEME2|0.218488| |THEME3|0.086466| |THEME4|0.241419| |THEMES|0.154495|
CITATIONS
Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry/ Geospatial Research, Analysis, and Services Program. CDC/ATSDR Social Vulnerability Index [Insert 2020, 2018, 2016, 2014, 2010, or 2000] Database [Insert US or State]. https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html. Accessed on November 8, 2022.
U.S. Census Bureau. (2020). 2020 American Community Survey 5-year Estimates. Retrieved from API calls to https://api.census.gov/data/2017/acs/acs5?get=NAME,B25077_001M&for=state:*
“TIGER\Line Tract Level Geographies.” Index of /Geo/Tiger/TIGER2020/Tract, US Census Bureau, 1 Feb. 2021, https://www2.census.gov/geo/tiger/TIGER2020/TRACT/.
Flanagan, Barry E.; Gregory, Edward W.; Hallisey, Elaine J.; Heitgerd, Janet L.; and Lewis, Brian (2011) "A Social Vulnerability Index for Disaster Management," Journal of Homeland Security and Emergency Management: Vol. 8: Iss. 1, Article 3. DOI: 10.2202/1547-7355.1792 Available at: http://www.bepress.com/jhsem/vol8/iss1/3
XGBoost, Xgboost.ai, https://xgboost.ai/.
Bryan, Michael B. “Block Group Datasets.” Open Environments Dataverse, Feb. 2022, https://dataverse.harvard.edu/dataverse/openenvironments. https://github.com/OpenEnvironments/blockgroupvulnerabilit
U.S. Select Demographics by Census Block Groups
Overview
This dataset re-shares cartographic and demographic data from the U.S. Census Bureau to provide an obvious supplement to Open Environments Block Group publications.These results do not reflect any proprietary or predictive model. Rather, they extract from
Census Bureau results with some proportions and aggregation rules applied. For additional support or more detail, please see the Census Bureau citations below.
Cartographics refer to shapefiles shared in the Census TIGER/Line publications. Block Group areas are updated annually, with major revisions accompanying the Decennial Census
at the turn of each decade. These shapes are useful for visualizing estimates as a map and relating geographies based upon geo-operations like overlapping. This data is kept in a geodatabase file format and requires the geopandas package and its supporting
fiona and DAL software.
Demographics are taken from popular variables in the American Community Survey (ACS) including age, race, income, education and family structure. This data simply requires csv reader software or pythons pandas package.
While the demographic data has many columns, the cartographic data has a very, very large column called "geometry" storing the many-point boundaries of each shape. So, this process saves the data separately, with demographics columns in a csv file and
geometry in a gpd file needed an installation of geopandas, fiona and DAL software. More details on the ACS variables selected and derivation rules applied can be found in the commentary docstrings in the source code found here: https://github.com/OpenEnvironments/blockgroupdemographics.
## Files
While the demographic data has many columns, the cartographic data has a very, very large column called "geometry" storing the many-point boundaries of each shape. So, this process saves the data separately, with demographics columns in a csv file named
YYYYblcokgroupdemographics.csv. The cartographic column, 'geometry', is shared as file named YYYYblockgroupdemographics-geometry.pkl. This file needs an installation of geopandas, fiona and DAL software
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
koamabayili/VECTRON-author-checklist: VECTRON author checklist
We have done our best to complete the author checklist relating to the use of animals in the hut study. Note that the objective for the hut study was to evaluate the IRS treatment applications for residual efficacy against Anopheles mosquitoes, including the local An. coluzzii mosquito population. Cows were only used to attract mosquitoes into the huts and no tests were carried out directly on the cows. The author checklist is intended for use with studies where experiments are carried out on animals, which is why we have had such difficulty in completing this for the hut study, as many of the questions do not relate to how the cows were used
- …
