Kodig

Community Sourced Advice

Cost of home insurance by county in US

We created this choropleth of home insurance values across various counties in the US. We parsed the Topojson file of US counties available on the internet and plotted shapes of counties using ruby gem for Gnuplot. The broad steps we followed to create the choropleth are:

Parsing topojson

The US counties and states topojson can be downloaded from the internet.

TopoJSON is a JSON format for encoding geographic data structures into a shared topology. The file format supports representing multiple geometry types.

An example of geometry for Valley county, MT which is a part of the US topojson file is given below.The ‘id’ in the dictionary is county FIPS code. (Here is a list of FIPS codes for US counties)

{"type":"Polygon","id":30105,"arcs":[[3,4,5,6,7,8]]}

A TopoJSON topology represents one or more geometries that share sequences of “arcs”. The value of an arc is an array of positions. Arcs can be referenced by a zero-based index. For example, index 0 refers to the first arc, index 1 refers to the second arc, and so on.

The positions in arc are delta-encoded. The first position of the arc is a normal position [x₁, y₁]. The second position [x₂, y₂] is encoded as [Δx₂, Δy₂], where x₂ = x₁ + Δx₂ and y₂= y₁ + Δy₂. The third position [x₃, y₃] is encoded as [Δx₃, Δy₃], where x₃ = x₂ + Δx₃ = x₁ + Δx₂ + Δx₃ and y₃ = y₂ + Δy₃ = y₁ + Δy₂+ Δy₃ and so on.

Additionally, there are some transforms which have to be applied to the positions in the arcs to get the absolute position which can be plotted and there will be negative arcs  also. The detailed topojson specification can be found here.

An example of an arc is given below

[[162416,583189],[236,-863],[95,-3199],[219,-1079],[-271,-1241]]

The ruby code to parse each arc and construct a hash of arc mapped to its absolute positions is given below

geohash = JSON.parse(File.read('us.json'))

txform = geohash['transform']

arcs = geohash['arcs']

absarcs=[]

arcs.each do |arc|

    xli=[]

    yli=[]

   

    prex=0

    prey=0

    arc.each do |pt|

       x=pt[0]+prex

       y=pt[1]+prey

        prex=x

        prey=y

       

        xli << (x*txform['scale'][0])+txform['translate'][0]

        yli << (y*txform['scale'][1])+txform['translate'][1]

    end

    absarcs << [xli,yli]    

end

Plotting the map using Gnuplot

Once the hash is built we can iterate the Topojson file, parse the geometry of each county and construct the array of positions which constitute the shape of each county. The array of strings can be plotted using Gnuplot.

The method we used to plot is given below

def plottoimage(data)

    Gnuplot.open do |gp|

      Gnuplot::Plot.new( gp ) do |plot|

        plot.xrange "[-135:-50]"

        plot.yrange "[20:52]"

        plot.size "1,1"

       

        plot.terminal "png"

        plot.output File.expand_path(“choropleth.png”, __FILE__)

        plot.data = data

       

      end

    end

end

When we parse counties and plot them without filling them, the output looks like this

sin_wave223.png

Deciding color of the choropleth

The color with which each county should be represented can be calculated with this code

countydata = JSON.parse(File.read('county_data.json'))

insvals = []

countydata.each do |cid,cnty|

    insvals << countydata[cid]["insval"].to_i

end

$minins,$maxins = insvals.min,insvals.max

$colorlevels = 5

def calculate_color(countyval)

    level = ((countyval-$minins)*$colorlevels/($maxins-$minins)).ceil

    color = 255-(level*255/$colorlevels).ceil

    hexcolor = color.to_s(16)

    if hexcolor.length==1

        hexcolor = '0'+hexcolor

    end

    '#0000'+hexcolor

end

The code basically takes the value of home insurance for each county and quantizes it to a value of blue color. The code can be made to return any color be tweaking it a little.

Once this is done one can generate the choropleth by iterating through the geometries and constructing a Gnuplot data objects and plotting them. The code below does this. ( The white areas in the choropleth are present because we chose to ignore a few counties with MultiPolygon Geometry. )

data = []

bdata = []

geohash['objects']['counties']['geometries'].each do |allarcs|

    if allarcs['type']=='Polygon'

        id = allarcs['id']

        stx,sty=[],[]

        allarcs['arcs'][0].each do |arc|

           

     

            if arc>=0

                pt = absarcs[arc]

                stx=stx+pt[0]

                sty=sty+pt[1]

            else

                pt = absarcs[arc.abs-1]

                rx=pt[0].reverse

                ry=pt[1].reverse

               

               

                stx=stx+rx

                sty=sty+ry

            end

           

         end

           

            begin

                data << Gnuplot::DataSet.new([stx,sty]) do |ds|

                                         ds.with = "filledcurve fs solid 1.0 lc rgb '"+calculate_color(countydata[id.to_s]['insval'].to_i)+"'"

                                         ds.notitle

                                   

                end

                data << Gnuplot::DataSet.new([stx,sty]) do |ds|

                                         ds.with = "lines lc rgb '#000000'"

                                         ds.notitle

                                         

                end

            rescue Exception => e

                puts e,"some exceptio"

            end

    end  

end

plottoimage(data)