What data is drawn into the data resource and where does it come from?

Annotations

Enter a comma separated list of user names.
Tim Schütz's picture
August 10, 2023

GitHub Repository

“To empower additional modeling efforts, the complete time series of all daily PVI scores and data are available at https://github.com/COVID19PVI/data. “

12 Key Indicators

“[The authors] assembled U.S. county- and state-level datasets into 12 key indicators across four major domains: current infection rates (infection prevalence, rate of increase), baseline population concentration (daytime density/traffic, residential density), current interventions (social distancing, testing rates), and health and environmental vulnerabilities (susceptible populations, air pollution, age distribution, comorbidities, health disparities, and hospital beds).”

Three types of modeling

“Our modeling efforts directly address the discussion in [6], by contextualizing factors such as racial differences with corrections for socioeconomic factors, health resource allocation, and co-morbidities, plus highlighting place- based risks and resource deficits that might explain spatial distributions. Specifically, three types of modeling efforts were performed and are regularly updated. First, epidemiological modeling on cumulative case- and death-related outcomes provides insights into the epidemiology of the pandemic. Second, dynamic time-dependent modeling provides similar outcome estimates as national-level models, but with county-level resolution. Finally, a Bayesian machine learning approach provides data-driven, short-term forecasts. “

Blackness and PM 2.5

“With respect to factors affecting COVID-19 related mortality, we find that the proportion of Black residents and the PM2.5 index of small-particulate air pollution are the most significant predictors among those included, reinforcing conclusions from previous reports[7]. An increase of one percentage point of Black residents is associated with a 3.3% increase in the COVID-19 death rate. The effect of a 1 g/m3 increase in PM2.5 is associated with an approximately 16% increase in the COVID-19 death rate, a value at the high end of a previously reported confidence interval from a report in late April 2020[7] when deaths had reached 38% of the current total.”

Machine learning and prediction

“To accurately predict future cases and mortality, it is necessary to account for the fluid nature of the data. Accordingly, we developed a Bayesian spatiotemporal random-effects model that jointly describes the log-observed and log-death counts to build local forecasts. Log-observed cases for a given day are predicted using known covariates (e.g., population density, social distancing metrics), a spatiotemporal random-effect smoothing component, and the time- weighted average number of cases for these counts. This smoothed time-weighted average is related to a Euler approximation of a differential equation; it provides modeling flexibility while approximating potential mechanistic models of disease spread. The smoothed case estimates are used in a similar spatiotemporal model predicting future log-death counts based on a geometric mean estimate of the estimated number of observed cases for the previous seven days as well as the other data streams. The resulting county-level predictions and corresponding confidence intervals are shown (Fig. 1)."

Source: https://www.researchgate.net/publication/343642027_The_COVID-19_Pandemic...

Tim Schütz's picture
August 10, 2023

“Data sources in the current model (version 11.2.1) include the Social Vulnerability Index (SVI) of the Centers for Disease Control and Prevention (CDC) for emergency response and hazard mitigation planning (Horney et al. 2017), testing rates from the COVID Tracking Project (Atlantic Monthly Group 2020), social distancing metrics from mobile device data ( https://www.unacast.com/covid19/social-distancing-scoreboard), and dynamic measures of disease spread and case numbers ( https://usafacts.org/issues/coronavirus/). Methodological details concerning the integration of data streams—plus the complete, daily time series of all source data since February 2020 and resultant PVI scores—are maintained on the public Github project page (COVID19PVI 2020). Over this period, the PVI has been strongly associated with key vulnerability-related outcome metrics (by rank-correlation), with updates of its performance assessment posted with model updates alongside data at the Github project page (COVID19PVI 2020).”

Source: https://ehp.niehs.nih.gov/doi/10.1289/EHP8690

Aiden Browne's picture
May 31, 2022

CalEPA Regulated Site Portal (CalEPA RSP) does not generate any of their own data but combines data from 7 state data sets and 2 federal datasets. 

State Data Sets: Cal/OSHA, CERS, CIWQS, EnviroStor, GeoTracker, SMARTS and, SWIS

Federal Data Sets: EIS and TRI

Margaret Tebbe's picture
March 17, 2022

This database uses a broad variety of data. Most of the data is collected by the EPA itself. Users are able to search for facilities regulated under the following systems:

  • Risk Management Plan (RMP)
  • Toxic Release Inventory (TRI)
  • National Pollutant Discharge Elimination System (NPDES) - under the Clean Water Act
  • ICIS-Air
  • Resource Conservation and Recovery Act (RCRA) - hazardous waste
  • Safe Drinking Water Act (SDWA)
  • Superfund Enterprise Management System (SEMS)
  • Clean Air Markets Division Business System (CAMDBS)
  • Greenhouse Gas Reporting Program (GHGRP)
  • Emissions Inventory System
  • Toxic Substances Control Act (TSCA)

When looking at individual facilities, the database provides detailed facility reports, enforcement case reports (civil and criminal), air pollutant reports, effluent charts, pollutant loading reports, effluent limit exceedances reports, CWA program area reports, permit limits reports, and other facility documents as available. The database provides easy ways to download and map the data. The database also allows users to narrow facilities searches using demographic data from EJScreen (also maintained by the EPA), the U.S. Census, and tribal land data.

Users can also look for information on federal administrative and judicial enforcement actions through an enforcement case search.

Margaux Fisher's picture
February 28, 2022

The Student Health Index draws from data that is publicly available and up to date on a statewide level. Sources include the University of California San Francisco Health Atlas, the American Community Survey, the U.S. Census Bureau, the California Department of Education’s Downloadable Data Files site, and the CDC.

 

 

Detailed list of sources:

PLACES Project, CDC (available through the UCSF Health Atlas)

CalEnviroScreen (available through the UCSF Health Atlas)

Opportunity Atlas (available through the UCSF Health Atlas)

Health Resources and Services Administration (available through the UCSF Health Atlas)

American Community Survey (available through the UCSF Health Atlas)

California Department of Education’s Downloadable Data Files site

Kidsdata.org

February 22, 2022

The EMT disaster database is compiled from a wide variety of sources, including UN agencies, NGOs, insurance companies, research institutes, and press agencies. The dataset compilation process prioritizes data from UN agencies, the International Federation of Red Cross and Red Crescent Societies, and government agencies. Entries are reviewed prior to consolidation, and this process of checking and incorporating data is done on a daily basis. More routined  data checking and management also occurs at a monthly interval, with revisions made at the end of each year.

Margaux Fisher's picture
February 12, 2022

he data included in the index is drawn from publicly available sources that are up to date at a census-tract level statewide for California. In order to be included in the index, this data is also required to be demonstrably linked/in correlation with life expectancy, and to be actionable in some way (e.g. policy).

 

These sources included:

1.  U. S. Census Bureau's American Community Survey (ACS)

2. California Environmental Protection Agency (CalEPA)

3. US Department of Housing and Urban Development (HUD)

4. Green Info, (parks)

5. The National Land Cover Database, (tree canopy)

6. US Department of Food and Agriculture (supermarket access)

7. US Environmental Protection Agency (retail density)

8. University of California, Berkeley (voter participation)

Virginia Commonwealth University also provided access to their analysis of life expectancy at the California census tract level.

 

California healthy places index

Tim Schütz's picture
February 11, 2022
In response to:

“The Louisiana Tumor Registry (LTR) collects information from the entire state on the incidence of cancer. This information includes the types of cancer (morphology, grade, and behavior), anatomic location, extent of cancer at the time of diagnosis (stage), treatment, and outcomes (survival and mortality).”

 “[A]ny health care facility or provider diagnosing or treating cancer patients shall report each case of cancer to the registry. It also protects health care facilities and providers that disclose confidential data in good faith to the LTR from damages arising from such disclosures.”, see cancer reporting.