I’ve been playing with the public covid data on BigQuery which is available here. I was curious how covid spreads related to temperature. The simple intuition being: when it’s either too hot or too cold, people spend more time indoors, and that will facilitate the spread of the virus. If that’s the case, you’d expect to see relatively more waves start in U.S. counties with temperatures too low or too high vs. other counties at the same time. Let’s see if that’s the case.

I set up a little indicator that tags beginnings and ends of covid waves. We’re looking for the valley points right at the beginning and at the end of a wave (I’ll call it takeoff and leveloff points).

(1) Start with the cumulative number of cases time series from the public covid data, per county, cumulative_cases[t]

(2) Create a time series that is the 14-day difference, then just use points on Mondays and drop all others: cases[t] = cumulative_cases[t] – cumulative_cases[t-14]

(3) Create a time series that tags large percentage changes and is 0 otherwise: change[t] = { 1 if cumulative_cases[t] / cumulative_cases[t-1] >= 1.2; -1 if cumulative_cases[t] / cumulative_cases[t-1]; 0 else}

(4) Slide a forward- and backward-looking window over the time series and calculate all possible takeoff and leveloff points.

- takeoff_all[t] = { 1 if sum(change[t-2..t]) <= -2 and sum(change[t+1..t+3]) => 0; 0 else }
- leveloff_all[t] = { 1 if sum(change[t-3..t-1]) <= 0 and sum(change[t..t+2]) >= 2; 0 else }

(5) Take the difference over that time series to just pick out the initial points where a wave takes off or levels off.

- takeoff[t] = { 1 if takeoff[t] – takeoff[t-1] > 0; 0 else }
- leveloff_all[t] = { 1 if leveloff[t] – leveloff[t-1] > 0; 0 else }

That’s a good enough algorithm for my purposes. Here are some pictures of what it finds. It misses a few waves, but on average it seems fine to me.

Let’s look at this over time. Here is the percentage of counties added over a 4-week sliding window which in that 4-week window have a covid takeoff. Seems about right.

Now let’s bring in the temperature. This is the percentage of counties with temperature on average over 75 degrees F in a particular week (sort of the point where you’d go indoors, because this is just the average daily temperature) with a covid takeoff minus that same number in all other counties at that time. That’s where we see that there really might be something to this simple notion that being indoors makes a difference.

Here is another look at the data. This shouldn’t actually be about hot vs. cold – because on both extremes, you’re spending more time indoors. So let’s look at the top-20% hottest counties at any given point in time, the top-20% coldest counties at the same point in time, and all other counties, and the % of counties with a covid takeoff in that time period. That’s below. Seems to me that essentially any time you have a covid wave, the counties in the temperature extremes have a higher chance of breaking out. Kind of makes sense.

I know I should also look at how high those wave peaks get, but I’ll leave that for another time.