Adventures in Smarshland

October 16, 2011

RMySQL Install on Windows XP 32-bit

Filed under: Uncategorized — smarsh @ 4:34 pm

I found the install for RMySQL, an R package that links to a MySQL database, somewhat cumbersome.  Here were a few things I had to tweak to get it right.

For starters, I was installing using R-2.13.2, RMySQL-0.8-0, and MySQL-5.5

First, you have to specify the path to MySQL:

For me this was: C:\\Program Files\\MySQL\\MySQL Server 5.5, but you must specify it a little bit differently, here is the R command to execute from within Rgui.exe


Next, the R install expects to find a few files (libmysql.dll and libmysql.lib) that in MySQL 5.5 reside in C:\\Program Files\\MySQL\\MySQL Server 5.5\\lib to actually reside in C:\\Program Files\\MySQL\\MySQL Server 5.5\\lib\\opt

To fix this make the subdirectory “opt” and copy the files into that subdirectory

You will also need to copy the file “libmysql.dll” from “C:\\Program Files\\MySQL\\MySQL Server 5.5\\lib” to your R\\bin directory which would be “C:\\Program Files\\R\\R-2.13.2\\bin\\i386” on 32-bit Windows or on 64-bit Windows “C:\\Program Files\\R\\R-2.13.2\\bin\\x64”, Hat-tip to the following StackOverflow answer.

January 1, 2011

Upgrading ReadyNAS NV+ Memory to 1GB

Filed under: Uncategorized — smarsh @ 7:03 pm

My brother had a spare 1GB stick of memory for my ReadyNAS NV+ for me to upgrade my NAS.  The upgrade is rather painless.  There are good guides online.

  • See this YouTube video
  • This guide
  • And make sure to do this memory test, if you don’t you might find your data being corrupted, user have reported this issue.  They actually recommend doing the test twice because it is only about 85% effective at identifying bad memory.

March 21, 2010

Obamacare likely to pass

Filed under: Politics,Uncategorized — Tags: , , — smarsh @ 5:54 pm

With Obama promising to sign an executive order strengthening the clause on the ban on using federal funds for abortion, it looks as if the reconciliation bill is likely to become law [1]. At least the people over at putting real money on the line believe the odds are very high.


January 29, 2010

Obama at his best

Filed under: Politics,Worth Sharing — Tags: , , , — smarsh @ 6:49 pm

Visit for breaking news, world news, and news about the economy

November 15, 2009

Product Review: Sennheiser PXC 450 Noise-Canceling Headphones

Filed under: Reviews — Tags: , , , , , — smarsh @ 12:23 am

I recently bought the  Sennheiser PXC 450 NoiseGard Active Noise-Canceling Headphones.  I was intrigued by a TED lecture on the effects of noise on our lives physiologically, psychologically, cognitively, and behaviorally.  I work in a sea of cubicles next to someone who has incessant distracting conversations at inappropriately loud decibel levels.  I also fly a fair amount, not a much as a consultant, but probably a couple standard deviations above the norm.  So for both of these reasons I thought I might benefit from some noise canceling headphones.  My brother had purchased the same headphones and seemed to be happy with his purchase.

Overall, I am very happy with the sound quality of the headphones.  The noise canceling seems to be effective as well, when my furnace, which is quite loud, kicks on it is barely noticeable with the headphone on using active noise canceling.  I have not had a chance to test the headphone on an airplane flight yet.  I will update the post the next time I take a flight, which will be in a couple weeks for Thanksgiving.


I recently took a trip to visit a friend in St. Louis for Thanksgiving which afforded me the opportunity to put the noise canceling headphones to the test.  I was very impressed at the ability of the headphones to cut out the ambient cabin noise, it made for a much more pleasant and less stressful travel experience.

October 24, 2009

Fresh Install

Filed under: Computer,Worth Sharing — Tags: , , — smarsh @ 5:17 pm

In preparing for Win 7 fresh install I needed to make an inventory of all the applications I wanted on my machine.  It is a good opportunity take stock of what I used and what was just clutter.  If you find this post helpful, please write your own post on you blog and leave a link in the comments.  Or if you don’t have a blog, just leave a list in the comments.

So without further ado:


Firefox Extensions

September 27, 2009

Replicating the Temperature Anomaly Dataset

Filed under: Uncategorized — smarsh @ 10:34 pm

I am and will always be a skeptic. For this reason, I have sought to replicate the dataset which is basis for the research in global temperature change debate.  One of the great achievement of technology recently is the ability to do very close to the same level of research using commodity hardware as that of major government funded projects.  Furthermore, due to the open source software movement much of the necessary software to scrap together large datasets and analyze them is free.  Even better much of the world’s government data is now freely available, or obtainable through FOIA requests.  The data I used to try to corroborate the global temperature anomaly time series is available through NOAA, through the National Climatic Data Center.

I used their ftp access and some linux commands to extract all available data.  Unfortunately, the data come in a separate tar file for each year inside of which is a .gz file for each weather station.  So using some further linux commands I untarred and unzipped these files.  Then I concatenated each of these files together and deleted the header rows.  First you will need to install wget, which can be accomplished through synaptic package manager.  The rest of the command can be executed from a standard Ubuntu terminal.  Of course you will first need to cd into an appropriate directory.

wget -nc -r -nd -A “*.tar”
cat *.tar | tar xv -ignore-blocks *.tar

for z in *.gz; do gunzip $z; done
for z in *.op; do cat $z >> data.txt; done

sed -i ‘/STN/d’ data.txt

Then I fired up a MySQL instance and loaded the giant 15 Gb file.  I found the following tutorial helpful for setting up the MySQL database.  The installation can be done with Synaptic package manager, but getting all of the permissions and the database created can be a little bit confusing.  After the database was set up I used a GUI front end, MySQL Administrator, to help craft the SQL queries a little easier.

The initial load of the data can be accomplished with the following query:

LOAD DATA LOCAL INFILE ‘/media/Backup Drive/GSOD/data.txt’ INTO TABLE weather.test
SET station=SUBSTR(var,1,6),

From this I summarized the data to the yearly level.  I then did some analysis of how clean the data were.  I eliminated any observations with less than 360 days of data per year.  I then executed some queries to find the set of weather stations that had no missing data in the period between at least 1973 to 2008 and no gaps if the station’s data extends further.  In other words these constraints serve to make sure the data is largely well reported and continuously reported.

Here is the SQL code to generate the data described above:

CREATE TABLE weather.aggregate (SELECT station, year, AVG(temp) AS temp, COUNT(*) AS cnt
FROM weather.test
WHERE year <= 2008
GROUP BY station, year
ORDER BY station, year)

DELETE FROM weather.aggregate WHERE cnt < 360 OR cnt > 366

CREATE TABLE weather.stations (SELECT station, COUNT(*) as num_years, (MAX(year)-MIN(year)+1) AS year_range, MAX(year) AS max_year from weather.aggregate GROUP BY station)

DELETE FROM weather.stations WHERE year_range > num_years OR max_year < 2008 OR num_years < 36

Now we need to index the data to some fixed time period so we can analyze the temperature variation across time.  In statistical terms this is just “de-meaning” the data allowing for a set of dummy variables across stations.  Since the 1973 to 2008 period is well reported in the entire set of stations we index to this period:

CREATE TABLE weather.mean_temp (SELECT aggregate.station, avg(aggregate.temp) AS mean_temp
FROM weather.aggregate, weather.stations
WHERE aggregate.year > 1972 AND aggregate.station = stations.station
GROUP BY station)

CREATE TABLE weather.aggregate2 (SELECT a.station, a.year, a.temp, (a.temp – s.mean_temp) AS temp_anom
FROM weather.aggregate a, weather.mean_temp s
WHERE a.station = s.station
ORDER BY a.station, a.year)

Now it is time to do some statistical analysis.  My favorite program for these purposes is R from CRAN.  R is a statistical programming language with strong support for objects types common to statistical analysis such as matrix and vector types.  Much of the cutting edge in statistical analysis is programmed in R and is heavily the favorite in the academic community.  It is also free open source software (FOSS).  R is similar to Firefox in that is comes with a lot of great base functionality, but much more can be unlocked through add-ins.

I took the last created table named aggregate2 and exported it as a comma seperated values file calling it resultset.csv

I fire up R and import this file to begin the analysis.  R is a a scripting language.  Here are the commands I executed in my script:

data2 = read.table(file=”/home/scott/resultset.csv”,sep=”,”,header=TRUE)
qplot(year,temp_anom,data=data2,geom=c(“point”,”smooth”,”jitter”),alpha=I(1/10),ylim=c(-4,4),main=”Global Temp. Anom.\nHigh Reliability Stations”,xlab=”Year”,ylab=”Temp (Deg. F)”)

I used variants on the qplot function to create the following graphics outputs:

Taking averages over stations across years we have the graph below, however this obscures the reliability of the data across years.  Before 1973 we only have at best 24 stations to draw from, so the standard error of the process mean explodes by a factor of 3.6.  Before 1946 there are fewer than 10 stations tapering down to only 1 station first two years of the time series.


Looking at the data from the higher reliability period of 1973 and forward we observe a pattern of increasing temperature.  A basic OLS regression estimates the increase at 4.8 deg. f per century, with at better than 0.001 percent confidence level.  However, OLS may not be an appropriate methodology given the time series nature of the data.  An ARIMA model would probably be much more appropriate, however ARIMA models are difficult to estimate with much precision when there are so few observation to build the model, much less validate or forecast from.  The data would probably be best estimated with at the daily level in a time series cross section framework (TSCS) with seasonal differencing.


The following graphs I think are instructive in understanding the reliability of the data in the pre-1973 period.  Each point is a single yearly station observation of annual mean temperature.  The data are slightly perturbed randomly and made semi-transparent so you can get a better sense of the density of the data.  As you can see the estimate of the process mean before 1973 is quite a bit more volatile than the period 1973 and afterward.  However, due to the penalty function used in the estimating basis function of the smoother, the post-1973 period is over-smoothed and the pre-1973 is under-smoothed.


According to the fit of the smoothing function on only the post-1973 period, it appears that the process mean is about 1.7 deg. f. higher over the course of the 36 year period.  I still remain skeptical of strength of the data to make long range forecast of this time series, mainly because of the short duration of the data and inability to build the model and test it on a holdout period.


It is also important to understand the geospatial element of the data.  The following is a graphic of the locations of the station used in the analysis.  We lack homogeneity especially in the southern hemisphere and no data in the ocean region aside from a few islands in the pacific.


September 10, 2009

Heathcare: Hopefully Information and Analysis You Haven’t Heard Yet

Filed under: Uncategorized — smarsh @ 7:24 pm

Passing universal single-payer federal health insurance legislation has been the political ambition of many influential politicians for decades to no avail.  It strikes me, upon a little reflection, this is by design (of our Constitution).  The Reserved Powers clause of the Tenth Amendment, given no prior stipulation of the federal government’s authority to tax for and provide health insurance.  A reading as such should be interpreted to be reserved to the States to solve the issue in a fashion that protects their rights to fashion a law appropriate to their specifics circumstances.  Little attention has been paid to the attempts several states have made at providing health coverage.  Hawaii is one notable state that does have a “public” option.  But their are many other liberal states which would likely support single-payer systems, so the question becomes, if not, why not?  The answer I believe is two-fold, but first a history lesson.
Why is it today that the typical pooled risk group through which health insurance coverage is purchased is one’s employer?  The system arose during WWII when the federal government imposed wage restrictions and price controls.  So, firms, knowing they must provide a compelling reason to attract and retain top talent began providing perquisites which were not regulated by the IRS, one such instance was health insurance.  Note that the same tax exempt status of health insurance and other perquisites still remain today.  Naturally firms will maintain these attractive features because it is cheaper to provide them on a pretax basis.  In essence health insurance through employers exists today for two reasons, its inception was spurred by artificially imposed wage rigidities and remains because of tax avoidance strategies.  Other pooled groups could otherwise be formed to solve the adverse selection problem such as schools, churches, unions, knitting clubs, etc.
But why haven’t states been able to solve the issue at least where it has a high degree of popular support.  Alas, many have tried, but have failed not for lack of legislative support, but because of federal legislation which “preempts” states rights.  States such as California, Colorado, Michigan, and Minnesota have all attempted legislation which levies a payroll tax to effectively create an assigned risk insurance pool.  Sadly, another breach of the Tenth Amendment has nullified such laws.  Section 514 of the Employee Retirement Income Security Act of 1974 (ERISA) has been the avenue by which corporations have appealed to the federal circuit courts to intervene on their behalf.  Hawaii is the only state other than Massachusetts which has state level health coverage, to the best of my knowledge.  Hawaii is able to provide coverage because their system was set up before the 1974 legislation and is hence grandfathered.  Massachusetts has a barely workable system because the fee charged to employers who don’t provide coverage is so nominal they have not banded through collective action to challenge the legislation under s.514 of ERISA, but if they did they would likely be successful in getting it overturned.
I would like to provide for an assigned risk pool coverage option so basic health coverage can be afforded by all, even if I have to subsidize coverage for others, for the same reason I support the socialized provision of fire departments.  I believe, when poor and rich alike are treated for communicable diseases and other basic health problems everyone benefits, call it an externality if you like.  My bigger beef is the solutions best suited for different states are likely very different in form.  Washington state for instance has a relatively youthful population, lower rates of smoking, higher rates of physical active people.  Washington state probably needs more sport medicine doctors to keep active people active, something a federal cookie cutter approach likely would not recognize.  On the opposite end of the spectrum, Ohio probably needs more dietitians and smoking cessation specialists given high rates of obesity and smoking.

I want to cover several additional topics and will do so when I have the time:
-the market efficiency of high deductibles
-ending federal corn subsidies as a method of improving health and the consequent overproduction of corn syrup
-The competing “public option” as a violation of 5th amendment’s contravention against deprivation of private property with due compensation

May 10, 2009

Lightbulb Moment (The New Capitalism)

Filed under: Uncategorized — smarsh @ 3:55 pm

There has been this persistent nagging problem that has frustrated me about economic science, the measurement of human welfare. It started in college when a skeptic in our class asked why the value of domestic production (for instance child raising, home made crafts kept as family heirlooms) are not valued in the measurement of GDP. The answer given was that the measurement implicitly quantified the opportunity cost associated with the tradeoff between household production and production of societal wealth. Furthermore, it was suggested by the professor that if raising children is done well than the children will be more productive members of society in the future, so it can be viewed as an investment in which the stay at home parent is giving up present income for increase value of future income by their offspring. But I was still not convinced.  It is maybe well to pause to understand explicitly what GDP is supposed to represent: the market value of all final goods and services produced within the borders of a nation.  So part of the problem is the transactions I am describing don’t take place in a market, but still have value.

I continued to think it through, what else do we miss in this less than perfect measure.  Well, we certainly miss transactions based on barter or favors done for family and friends, such as when my family helped my uncle put a new roof on his house.  Certainly in that case value was created but its measure was not recorded.  However, is that just a case of the reverse broken window fallacy.  Presumably, as a result my uncle had more disposible income because he did not have to pay a roofing contractor to do the labor and could use that money to purchase other goods and services.  So in that context I think the indirect benefits are still recorded as wealth is created or preserved and can be used to finance cash flows.  So maybe a similar analogy holds for the case of a stay at home parent.  The family would otherwise have to pay for child care, whose quality is probably significantly worse than a parent’s own care because who cares more about the own child’s welfare more a parent or an employee, in other words a classic principle agent problem is avoided.  So maybe GDP is still measuring what it is supposed to, just indirectly.

But, I still felt something was amiss.  Two different examples finally struck me hard enough to get me to think about the problem from a different perspective.  I think the problem can be best summarized by the quote by the famed pioneer of “value investing” Benjamin Graham, “price is what you pay, value is what you get”.  The problem with netbooks is that companies, namely Intel and laptop manufactures who use their low power cheap CPUs, get much less revenue and profit from these laptop than their more full fledged counterparts.  But, when 95% of what people want to do with a computer is browse the interwebs and maybe write a word document and do their quicken, then you get just as much value at a lower price.  But this makes a significant dent in GDP, lower prices mean lower recorded “market value of final goods”.  So in some sense Moore’s law has led to significant cost deflation and value creation.  GDP does not record value it records price.  Similar to an infomercial, “But Wait, there’s more”.  Isn’t this just like the reverse broken window fallacy I desribed above.  Doesn’t the lower price mean that the consumer is left with more money in their pocket which means they can either save the difference or use it to buy more goods and services they could not have puchased otherwise.

Hurumph, I thought I finally I figured out what had troubled me all these years.  It wasn’t until I was watching a youtube video that it finally dawned on me, the Benjamin Graham quote was still right, the reverse broken window fallacy didn’t explain it all.  You see the youtube clip was costless to me, perhaps you could argue it was add supported, but nonetheless very little “value of final goods and services” would be recorded in GDP, but I enjoyed it tremendously.  Furthermore, if someone had asked me at what price would you be indifferent between paying to watch the video and not watch the video, I probably would say ~$10.  So the economy had produced $10 of value, but recorded only maybe a nickle of advertising revenue.  I think a similar arguement could be made for the netbook case, the vast major of people are getting the 95% of the value of an expensive laptop ~$1,200 for the price of $350 plus they can spend difference on other goods and services which has a multiplier effect.  What I trying to illustrate here is not a shocking revelation to economics, it has long been understood that the triangle area to the left of the equilibrium price on a supply and demand diagram between the demand curve and the supply curve is known as consumer and producer welfare, but this is not how we quantify the success of our economy.  The case of “micronutrients” such as iodine enfused salt is a classic example of something that cost almost nothing but has enormous welfare consequences, prevent goiters and raising IQ in developing children.  Another market that has been receiving a lot of attention lately, the auto industry, the commentary is also about GDP based analysis of success rather than consumer welfare based measures.  One of the criticisms of the move toward a fleet of vehicles which are smaller and lighter than behemoth SUVs is they garner a substancial lower price tag also with lower margins.  But the welfare to society is much the same.  So I’m sure as the industry retools to produce smaller cars GDP will be lower as a result, but I not convinced we will be worse for the change in terms of welfare.

Since I don’t have time to finish my thoughts I will like many college texts, “leave it to the reader” to show what the effect of a theoretical mass transit system in ever major U.S. city perhaps along the lines of the system found in cities such as New York or London, would have on the automotive industry, GDP, and welfare.

May 3, 2009

Less Obvious Social Security Flaws

Filed under: Uncategorized — smarsh @ 6:40 pm

Each year three months before you birthday, the social security bureaucracy sends you an estimate of your benefits based on “current law”.  Let’s ignore all the obvious actuarial flaws in the social security funding mechanism for a moment and look at the raw benefit calculation.  First you calculate your average monthly gross earnings for your highest 35 years of work, the result of this is called your Average Indexed Monthly Earnings.  There is a “indexing” adjustment made to your earnings to adjust in effect for inflation.  Then the calculation is much like the same as calculating taxes, except the marginal rates get small as AIME gets larger.  The first $606 get you 90% of your AIME, the next ~$3000 only get 32% marginally, and AIME above than adds 15% marginally.  In other words you don’t have to work very hard to get the majority of the benefit you would receive even if you worked alot more.  In fact you only need to earn ~$250,000 over your entire life to get the maximum of the first “bracket”.  I am quickly approaching this value and question why I should work so hard given the manner in which tax brackets and social security funding “brackets” are set up.  In taking into consideration the tradeoff between allocation of time to labor force participation and leisure, the government is certainly stacking the deck in favor of leisure.  The one thing that would prevent me from exiting the labor force would be having to pay for medical insurance privately, but I hear our President might “fix” that soon.

Older Posts »

Powered by WordPress