Sort by: Title | Author | Date
Adam Kawa

Apache Hadoop at CEON, ICM UW

Currently, this post is available here only in Polish. It will be translated as soon as possible.

wpmaster

(Polski) Otwarcie Bloga Centrum Otwartej Nauki

 

Assorted curiosities: Geography

Fun facts learned while clicking through Wikipedia:
  • Treasure Island in Ontario, Canada is probably the largest island in a lake in an island in a lake.
  • Liechtenstein and Uzbekistan are doubly landlocked countries, i.e., all the neighbouring countries are landlocked.
  • Republic of Kalmykia is the only predominantly Buddhist region of Europe.
  • The Door to Hell is a 70 m (230 ft) wide hole near the village of Derweze in Turkmenistan, filled with natural gas, which has been burning since 1971.
The Door to Hell near Derweze, Turkmenistan

Assorted curiosities: Geography

Fun facts learned while clicking through Wikipedia:
  • Treasure Island in Ontario, Canada is probably the largest island in a lake in an island in a lake.
  • Liechtenstein and Uzbekistan are doubly landlocked countries, i.e., all the neighbouring countries are landlocked.
  • Republic of Kalmykia is the only predominantly Buddhist region of Europe.
  • The Door to Hell is a 70 m (230 ft) wide hole near the village of Derweze in Turkmenistan, filled with natural gas, which has been burning since 1971.
The Door to Hell near Derweze, Turkmenistan

Assorted links

Some assorted links collected this week:

There are discussions in various places about merits, pitfalls, and misunderstandings related to buzzwords “bigdata”, “data science” (what a useless term it is…) etc., analysis being “data-driven” or “evidence-based” etc. Perhaps I will make a separate post on that at some point… For now:

Correction to intergraph update

It turned out that I wrote the last post on “intergraph” package too hastily. After some feedback from CRAN maintainers and deliberation I decided to release the updated version of the “intergraph” package under the  original name (so no new package “intergraph0″) with version number 1.2. This version relies on legacy “igraph” version 0.5, which is now called “igraph0″. Package “intergraph” 1.2 is now available on CRAN.

Meanwhile, I’m working on new version of “intergraph”, scheduled to be ver. 1.3, which will rely on new version 0.6 of “igraph”.

I am sorry for the mess.

intergraph+network: no hacking necessary

A short update on network+intergraph R packages story:

Couple of days ago Carter Butts released a new version of the ‘network’ package (ver. 1.5-1). It has a namespace now. Consequently, the ‘intergraph’ package should work out-of-the-box. There is no need to install my hacked version of the ‘network’ package anymore.

Math in the social sciences, with discussion

Nice discussion on the usefulness, or lack thereof, of mathematics and formal theory building in the social sciences. Make sure you have a look at the comments. More or less chronologically:

With some appraisal here, here, and to some extent here.

Somewhat in parallel, a discussion about the death of theoretical (read mathematical) economics at econlog:

All in all, I subscribe to Fabio’s call with both hands.

My subjective list of advantages of formal theory building in social sciences supplementing the one at orgtheory.net:

  1. If a theory is, among other things, a logically coherent set of propositions then formalizing it is just a translation to a language that makes analyzing it, especially deducing consequences, much easier. And this applies to whatever the subject of the theory is.
  2. Most of the empirical studies in sociology are analyzed using some form of statistical reasoning, which is mathematical. Given that, building a formal theory of the studied phenomenon should in principle allow for a tighter connection between the theory and empirics (c.f. The Theory-Gap in Social Network Analysis by Mark Granovetter).
  3. I would also add the “accumulativeness”, much in the line of Formal Rational Choice Theory: A Cumulative Science of Politics by  David Lalman, Joe Oppenheimer, and Piotr Swistak. Although, I have to admit, after having spent 5 years or so studying mathematical sociology and selective works from mathematical economics, the cumulation is sometimes difficult to observe from a local point of view and local time scale of individual researcher. There are so many specific models (strong assumptions etc.), and it is frequently hard to understand the bigger picture. Perhaps it is just the question of time for a “unification” to arrive, … or a researcher…
  4. ?

My first use of the state monad

While learning Haskell, I was looking for a concise implementation of a function which "reshapes" a list into a matrix.  Given the number of rows r, the number of columns c, and a list vs, the function should take r*c values from the list and create a r by c matrix out of them.  Here's the type:
toMatrix :: Int -> Int -> [a] -> [[a]]

First solution

First I wrote  a simpler function that would split a list into chunks of a given size, like this:
chunksOf :: Int -> [a] -> [[a]]
chunksOf _ [] = []
chunksOf c vs = h : (chunksOf c t)
where (h, t) = splitAt c vs
Using the above, toMatrix could be implemented this way:
toMatrix r c = chunksOf c . take (r*c)
I had a feeling that a function like chunksOf should be already present somewhere in the standard library, so I asked Hoogle, but to no avail. There was chunksOf in Data.Text, but it operated on Text only (I retroactively named my function after the one in Data.Text).  However, Hoogle returned replicateM as well…

Second solution

… and I realized I could use it with the state monad to implement toMatrix.  The state could contain the list of values yet to be consumed, and the action to be replicated could be chopping off c values from the list:
splitOnce :: Int -> State [a] [a]
splitOnce c = do
s <- get
let (h, t) = splitAt c s
put t
return h
After a while I realized that the same function could be written in a much more concise form:
splitOnce' :: Int -> State [a] [a]
splitOnce' = state . splitAt
The solution was, therefore:
toMatrix r = evalState . replicateM r . state . splitAt

Summary

The second version is good enough for me and as a bonus it helped me understand the state monad. Note that the two implementations of toMatrix are not equivalent, as they handle lists shorter than r*c in different ways.  Future work: find a concise and preferably point-free implementation of chunksOf.

Update (2013-01-08)

This answer on StackOverflow contains a very nice implementation of chunksOf.

My first use of the state monad

While learning Haskell, I was looking for a concise implementation of a function which "reshapes" a list into a matrix.  Given the number of rows r, the number of columns c, and a list vs, the function should take r*c values from the list and create a r by c matrix out of them.  Here's the type:
toMatrix :: Int -> Int -> [a] -> [[a]]

First solution

First I wrote  a simpler function that would split a list into chunks of a given size, like this:
chunksOf :: Int -> [a] -> [[a]]
chunksOf _ [] = []
chunksOf c vs = h : (chunksOf c t)
where (h, t) = splitAt c vs
Using the above, toMatrix could be implemented this way:
toMatrix r c = chunksOf c . take (r*c)
I had a feeling that a function like chunksOf should be already present somewhere in the standard library, so I asked Hoogle, but to no avail. There was chunksOf in Data.Text, but it operated on Text only (I retroactively named my function after the one in Data.Text).  However, Hoogle returned replicateM as well…

Second solution

… and I realized I could use it with the state monad to implement toMatrix.  The state could contain the list of values yet to be consumed, and the action to be replicated could be chopping off c values from the list:
splitOnce :: Int -> State [a] [a]
splitOnce c = do
s <- get
let (h, t) = splitAt c s
put t
return h
After a while I realized that the same function could be written in a much more concise form:
splitOnce' :: Int -> State [a] [a]
splitOnce' = state . splitAt
The solution was, therefore:
toMatrix r = evalState . replicateM r . state . splitAt

Summary

The second version is good enough for me and as a bonus it helped me understand the state monad. Note that the two implementations of toMatrix are not equivalent, as they handle lists shorter than r*c in different ways.  Future work: find a concise and preferably point-free implementation of chunksOf.

Update (2013-01-08)

This answer on StackOverflow contains a very nice implementation of chunksOf.