Blog CeON-u

11.02.2010 – 14:28

intergraph+network: no hacking necessary

A short update on network+intergraph R packages story:

Couple of days ago Carter Butts released a new version of the ‘network’ package (ver. 1.5-1). It has a namespace now. Consequently, the ‘intergraph’ package should work out-of-the-box. There is no need to install my hacked version of the ‘network’ package anymore.

By Michał | Posted in ceon, network | Tagged ceon | Comments Off

12.02.2010 – 18:49

R with Vim

For all those who think that Vim is The Editor for text files, and simultaneously think that R is The EnvironmentForStatisticalAnalysisAndGraphics.

After trying out various options for intergrating Vim with R I settled on the following configuration:

Use Vim-R-pluginfor editing R code files, R documentation files (*.Rd) as well as the Sweave files. Apart from syntax highlighting the plugin allows to open an R console in a separate window and operate it with keyboard shortcuts from Vim (no need for frequent alt-tabbing etc.). Among other things you can:
- Execute individual code lines, visually selected portions, or whole R code files in the R console.
- Putting a cursor on a function name in the code file and: display its R help page, or display function arguments (through args()).
- Put a cursor on any R object in the code file and perform frequently used functions: str, summary, plot, print, names…
- List the content of the R Workspace
- Clean the R Workspace
I use Sweave quite extensively. For Sweave files the Vim-R-plugin provides the same keyboard mappings as for the R code files as well as nicely highlights both the LaTeX code and the R code in the code chunks. As my Sweave files have mostly LaTeX code with rather short R code snippets I would like to take advantage of another Vim plugin: the LaTeX-Suite. By default Vim will not load the Latex-suite for Sweave files, which is a HUGE disadvantage.

Vim and R using Vim-R-Plugin in action

Here is a way how to use both plugins simultaneously for Sweave files. The instruction applies to Ubuntu (so probably any Linux-like system). On Windows the ~/.vim directory corresponds to the ‘vimfiles’ directory, which most likely is something like ‘c:Program FilesVimvimfiles’. So:

Install Vim-R-plugin normally.
Install Latex-suite normally.
In ~/.vim/ftplugin remove the symbolic link ‘rnoweb.vim’ and replace it with a normal text file with the following content:

runtime! ftplugin/r.vim
runtime! ftplugin/tex_latexSuite.vim

This will essentially load both plugins one after another. QED.

Edit

See here how to set it up on Mac

By Michał | Posted in ceon, geek, sweave, vim | Tagged ceon | Comments Off

03.08.2011 – 20:23

Math in the social sciences, with discussion

Nice discussion on the usefulness, or lack thereof, of mathematics and formal theory building in the social sciences. Make sure you have a look at the comments. More or less chronologically:

sociolgy needs more… @ orgtheory.net by Fabio Rojas
math and sociology @ orgtheory.net by Fabio Rojas
methodological convergence in the social sciences @ Marc F. Bellemare

With some appraisal here, here, and to some extent here.

Somewhat in parallel, a discussion about the death of theoretical (read mathematical) economics at econlog:

The decline of economic theory @ econlog by Bryan Caplan
Response on orgtheory.net

All in all, I subscribe to Fabio’s call with both hands.

My subjective list of advantages of formal theory building in social sciences supplementing the one at orgtheory.net:

If a theory is, among other things, a logically coherent set of propositions then formalizing it is just a translation to a language that makes analyzing it, especially deducing consequences, much easier. And this applies to whatever the subject of the theory is.
Most of the empirical studies in sociology are analyzed using some form of statistical reasoning, which is mathematical. Given that, building a formal theory of the studied phenomenon should in principle allow for a tighter connection between the theory and empirics (c.f. The Theory-Gap in Social Network Analysis by Mark Granovetter).
I would also add the “accumulativeness”, much in the line of Formal Rational Choice Theory: A Cumulative Science of Politics by David Lalman, Joe Oppenheimer, and Piotr Swistak. Although, I have to admit, after having spent 5 years or so studying mathematical sociology and selective works from mathematical economics, the cumulation is sometimes difficult to observe from a local point of view and local time scale of individual researcher. There are so many specific models (strong assumptions etc.), and it is frequently hard to understand the bigger picture. Perhaps it is just the question of time for a “unification” to arrive, … or a researcher…
?

By Michał | Posted in ceon | Tagged ceon | Comments Off

06.08.2011 – 19:04

Social contagion story update

July last year I wrote a note about the stream of papers by Nicholas Christakis and James Fowler (or CF) and coauthors on social contagion of many things (obesity, smoking, loneliness to name the few). I also wrote about a paper by Russel Lyons that provided a detailed critique of the analyzes presented in the papers by CF. See my earlier post for details.

Meanwhile, the paper by Lyons, which was available through ArXiv repository since July last year, got published in Statistics, Politics, and Policy journal here (thanks to Ilan Talmud for noticing that). All the substantial points remained largely unchanged as compared to the ArXiv paper. However, the author supplemented the paper with a truly hair-raising account of the struggle he had to go through to publish the paper: rejections from several journals without reviews or even reasonable explanations. I definitely recommend reading it.

Using the occasion, I also recommend two other papers related to this “debate”:

The first one is a response of Christakis & Fowler to some other critical comments on related issues:

Fowler, James H. and Nicholas A. Christakis. 2008b. “Estimating peer effects on health in social networks: A response to Cohen-Cole and Fletcher and Trogdon, Nonnemaker, and Pais.” Journal of Health Economics 27:1400–1405.

The second one is by Hans Noel and Brendan Nyhan

“The “Unfriending” Problem The Consequences of Homophily in Friendship Retention for Causal Estimates of Social Influence” (download)

in which they use MCMC simulations to show, in short, how network homophily could have confounded the purported contagion effects reported in the studies by CF.

By Michał | Posted in ceon | Tagged ceon | Comments Off

09.14.2011 – 18:55

Shortest paths to/from nodes of a certain type

Elijah asked the following via SOCNET mailing list:

I was wondering if anyone knew of a script or tool which would give me the network distance of nodes to a particular class of nodes. I think of this as an Erdos number, except instead of getting the distance to one node, I want the distance to the closest node of a particular class. Let’s say I have a network of people and I know their professions. Some are Students, some are Journalists and a small number are Engineers. I’d like to be able to find out the network distance of each node to the closest Engineer node. It would be particularly useful if the script also had the option to total edge weight into the calculation.

If you get your network data into R it is fairly straightforward to do this using igraph package. Here is the function:

# shortest paths to nodes with a specified value on certain node attribute
spnt <- function(g, aname, avalue, weights=NULL, ...)
{
  require(igraph)
  stopifnot(inherits(g, "igraph"))
  a <- get.vertex.attribute(g, aname)
  m <- shortest.paths(g, v=V(g)[a==avalue], weights=weights, ...)
  apply(m, 2, min)
}

It assumes that ‘g’ is a network (object of class ‘igraph’), ‘aname’ is a name of the node attribute, ‘avalue’ is the value of the attribute ‘aname’ that designates the nodes to/from which we would like to calculate distances, finally ‘weights’ can be optionally used to include weights in the calculation (as a numeric vector).

The function will return a vector of distances in ‘g’ from all the nodes to the closest node that have a value ‘avalue’ on attribute ‘aname’.

As an example consider the network below. It is undirected and has 15 nodes. It has two attributes defined: a node attribute called “color” having values “orange”, “lightblue”, and “lightgreen”, and an edge attribute called “w” with values 1 or 2. Both attributes are shown in the picture as a node color and edge label. The numbers on the nodes are node ids.

Assuming that the network is called ‘g’ we can use the function above in the following way:

# from lightblue nodes to all others
spnt(g, "color", "lightblue")
## [1] 0 1 2 1 2 3 0 1 2 1 2 3 2 3 4
 
# from orange nodes to all others
spnt(g, "color", "orange")
## [1] 1 0 1 0 1 0 1 0 0 2 1 0 2 1 0
 
# to lightblue, but using weights (shortest path = minimal weight)
spnt(g, "color", "lightblue", weights=E(g)$w)
## [1] 0 2 3 1 2 3 0 2 4 2 3 4 3 5 5

A couple of end notes:

In the result vector you will get 0s for the nodes of specified type, i.e. in the last example there are 0s for the “lightblue” nodes.
If a certain node is not connected (directly or via other nodes) to any node of specified type the vector will contain ‘Inf’ (plus infinity).
The algorithm will not accept negative weights. But this limitation can be effectively dodged by transforming the weights so that they are all positive (for example adding some number), performing the computation, and then transforming back the results to the original scale.
You can exploit other features of ‘shortest.paths’ function, on which this function is based. Any extra arguments to ‘spnt’ are passed to ‘shortest.paths’. For example, if the network is directed you can calculate shortest paths that are either incoming, or outgoing (via ‘mode’ argument). See help page of ‘shortest.paths’.

By Michał | Posted in ceon | Tagged ceon | Comments Off

09.20.2011 – 18:44

Package ‘intergraph’ (1.1-0) released!

I just released the first official version of the ‘intergraph’ R package.

With the functions provided in the current version (1.1-0) you can convert network data objects between classes ‘igraph’ and ‘network’. The package supports directed and undirected networks, and handles the node, tie, and network (graph) attributes. Mutliplex networks (i.e., with possibly multiple ties per dyad) are also supported, although not thoroughly tested.

Network objects of class ‘network’ (from package “network”) can be used to store hypergraphs. Conversion of these is not supported at this time.

Both ‘igraph’ and ‘network’ classes can be used to explicitly deal with bipartite networks. Currently, for the bipartite networks, only the conversion from ‘igraph’ to ‘network’ will work. I hope to be able to add the conversion in the other direction in future releases.

You can download and install the package from CRAN. The package sources are hosted on R-Forge here.

By Michał | Posted in ceon | Tagged ceon | Comments Off

12.28.2011 – 02:30

My first use of the state monad

While learning Haskell, I was looking for a concise implementation of a function which "reshapes" a list into a matrix. Given the number of rows r, the number of columns c, and a list vs, the function should take r*c values from the list and create a r by c matrix out of them. Here's the type:

toMatrix :: Int -> Int -> [a] -> [[a]]

First solution

First I wrote a simpler function that would split a list into chunks of a given size, like this:

chunksOf :: Int -> [a] -> [[a]]
chunksOf _ [] = []
chunksOf c vs = h : (chunksOf c t)
    where (h, t) = splitAt c vs

Using the above, toMatrix could be implemented this way:

toMatrix r c = chunksOf c . take (r*c)

I had a feeling that a function like chunksOf should be already present somewhere in the standard library, so I asked Hoogle, but to no avail. There was chunksOf in Data.Text, but it operated on Text only (I retroactively named my function after the one in Data.Text). However, Hoogle returned replicateM as well…

Second solution

… and I realized I could use it with the state monad to implement toMatrix. The state could contain the list of values yet to be consumed, and the action to be replicated could be chopping off c values from the list:

splitOnce :: Int -> State [a] [a]
splitOnce c = do
    s <- get
    let (h, t) = splitAt c s
    put t
    return h

After a while I realized that the same function could be written in a much more concise form:

splitOnce' :: Int -> State [a] [a]
splitOnce' = state . splitAt

The solution was, therefore:

toMatrix r = evalState . replicateM r . state . splitAt

Summary

The second version is good enough for me and as a bonus it helped me understand the state monad. Note that the two implementations of toMatrix are not equivalent, as they handle lists shorter than r*c in different ways. Future work: find a concise and preferably point-free implementation of chunksOf.

Update (2013-01-08)

This answer on StackOverflow contains a very nice implementation of chunksOf.

By Łukasz Bolikowski | Posted in ceon | Tagged ceon | Comments Off

12.28.2011 – 02:30

My first use of the state monad

While learning Haskell, I was looking for a concise implementation of a function which "reshapes" a list into a matrix. Given the number of rows r, the number of columns c, and a list vs, the function should take r*c values from the list and create a r by c matrix out of them. Here's the type:

toMatrix :: Int -> Int -> [a] -> [[a]]

First solution

First I wrote a simpler function that would split a list into chunks of a given size, like this:

chunksOf :: Int -> [a] -> [[a]]
chunksOf _ [] = []
chunksOf c vs = h : (chunksOf c t)
    where (h, t) = splitAt c vs

Using the above, toMatrix could be implemented this way:

toMatrix r c = chunksOf c . take (r*c)

I had a feeling that a function like chunksOf should be already present somewhere in the standard library, so I asked Hoogle, but to no avail. There was chunksOf in Data.Text, but it operated on Text only (I retroactively named my function after the one in Data.Text). However, Hoogle returned replicateM as well…

Second solution

… and I realized I could use it with the state monad to implement toMatrix. The state could contain the list of values yet to be consumed, and the action to be replicated could be chopping off c values from the list:

splitOnce :: Int -> State [a] [a]
splitOnce c = do
    s <- get
    let (h, t) = splitAt c s
    put t
    return h

After a while I realized that the same function could be written in a much more concise form:

splitOnce' :: Int -> State [a] [a]
splitOnce' = state . splitAt

The solution was, therefore:

toMatrix r = evalState . replicateM r . state . splitAt

Summary

The second version is good enough for me and as a bonus it helped me understand the state monad. Note that the two implementations of toMatrix are not equivalent, as they handle lists shorter than r*c in different ways. Future work: find a concise and preferably point-free implementation of chunksOf.

Update (2013-01-08)

This answer on StackOverflow contains a very nice implementation of chunksOf.

By Łukasz Bolikowski | Posted in ceon | Tagged ceon | Comments Off

01.11.2012 – 01:30

Assorted curiosities: Geography

Fun facts learned while clicking through Wikipedia:

Treasure Island in Ontario, Canada is probably the largest island in a lake in an island in a lake.
Liechtenstein and Uzbekistan are doubly landlocked countries, i.e., all the neighbouring countries are landlocked.
Republic of Kalmykia is the only predominantly Buddhist region of Europe.
The Door to Hell is a 70 m (230 ft) wide hole near the village of Derweze in Turkmenistan, filled with natural gas, which has been burning since 1971.

The Door to Hell near Derweze, Turkmenistan

By Łukasz Bolikowski | Posted in curiosities, wikipedia | Comments Off

01.11.2012 – 01:30

Assorted curiosities: Geography

Fun facts learned while clicking through Wikipedia:

Treasure Island in Ontario, Canada is probably the largest island in a lake in an island in a lake.
Liechtenstein and Uzbekistan are doubly landlocked countries, i.e., all the neighbouring countries are landlocked.
Republic of Kalmykia is the only predominantly Buddhist region of Europe.
The Door to Hell is a 70 m (230 ft) wide hole near the village of Derweze in Turkmenistan, filled with natural gas, which has been burning since 1971.

The Door to Hell near Derweze, Turkmenistan

By Łukasz Bolikowski | Posted in ceon | Tagged ceon | Comments Off