Approximating Pi via the Monte Carlo Method

I think I’m going to write up all my R stuff on RPubs now. So much easier!

Http://rpubs.com/philjet/montecarlopi

Advertisements
Approximating Pi via the Monte Carlo Method

Preparing to maybe train for the Knee Knackering North Shore Trail Run

So I did something a bit crazy for someone living in vertically-challenged Southern Ontario: I entered the lotto for the Knee Knackering North Shore Trail Run, a 50k trail race traversing Vancouver’s North Shore mountains via the Baden Powell trail, and amassing 8k feet (2400m) of vertical climb (and an equal amount of descent) along the way. The course itself is both beautiful and highly technical, starting with a killer 1000m vertical climb up to Cypress peak. Needless to say, it’ll be tough to replicate the course conditions during training here on the Escarpment, but the challenge is exactly what makes this interesting.

KKNSTR has been around since 1989, and when I lived in Vancouver (roughly 2005-2014), I was blown away by the fact this was a thing people did. Given that selection is by lottery, many end up trying a few times before finally getting in. So although this year wouldn’t be ideal for me, I figured I’d start trying now and give it a shot every year until I get in.

The lotto is held On Feb 5th, and only 270 runners are selected from all registered entrants. So the odds are against me, but it’s worth starting to think about my training now, taking inventory of the tools I have at my disposal to get me to the finish line in one piece should my name be called.

Oh yeah, I also have another training constraint: All of my runs (or maybe 95% of them), need to start out my front-door. I have 3 young kids and don’t have the time to drive out somewhere to go run.

So what are those tools at my disposal? What do I have close by to leverage during training?

  • A treadmill. Treadmills suck. And yes, there seems to be a bit of a stigma attached to calling yourself a trail runner while spending time on a treadmill. But the fact is there’s no other way for me to tackle prolonged climbs. I have a basic model with 10 degrees of incline, and I’ll put it to work power-hiking/running 1000m climbs.
  • A set of 498 stairs up the Escarpment (roughly 90m vert). Reps of the escarpment steps will be a great way to build up strength in my quads as well as improve my overall fitness. Boring, but better than the treadmill.
  • The Bruce Trail. I love the Bruce Trail. The sections close to me are quite technical, with the usual rolling terrain of Southern Ontario trails. I’ll run on the Bruce for most of my long runs.
  • Spring Ultras. I’m lucky to have a few great spring Ultras close by, and will likely get a 50k done sometime in late April or May as part of my training. Build-up races are a great way to dial in nutrition, get your gear sorted out, and of course get in a solid LR. I’ll likely shoot for the Sulphur Springs Trail Run in late May, which is good for about 1000m of vert.

All of this, of course, hinges on my name being called Feb 5th. If it doesn’t I’ll likely shift my focus to UTHC and apply similar training (with less focus on straight vert and extended climbs).

For now, I’m kicking things off easy. I’m shooting for about 30km/week, with a couple days per week dedicated to either the Escarpment stairs, or to hill reps on the mill. I’ll build up to a few consecutive weeks of roughly 80km, focusing on at least a couple back-to-back LRs in order to train on glycogen depleted legs.

Anyway, to be continued! I’ll keep updating about my training here, and will of course keep my fingers crossed on Feb 5th.

 

 

 

Preparing to maybe train for the Knee Knackering North Shore Trail Run

A choropleth map of American commute times

commute_times

I just posted my first Kaggle script, which uses American census data to look at commute times. Turns out if you hate commuting, the worst place to live is Maryland and the best is… South Dakota. It also turns out that while women typically earn far less than men, their commutes are pretty well identical. So they travel the same distance to earn less money.

Check it out on Kaggle here: https://www.kaggle.com/philjette/2013-american-community-survey/the-commuters

or on GitHub here: https://github.com/philjette/CensusData

A choropleth map of American commute times

Fun with the Yelp API

This post details my first attempt at using the Yelp API with R (and Tableau). Going in, I wasn’t quite sure in which direction I wanted to go. I had some big ideas but quickly came to the realization that I should come up with something simple first to figure things out. The search API has one major limitation: it returns a maximum of 20 records per search. This is pretty unkind to “big” ideas. So I decided to go with this:

  • Pull the top 20 highest rated “food” establishments by state
  • Each restaurant has a category. I’d look at the most frequently occurring category in the 20 records returned for a given state, and map the results in Tableau.

Pretty useless, and pretty straightforward to get my feet wet. A couple caveats: the code is ugly, and pretty well 100% old school R. I’m certain there are better ways to go about this, but efficiency wasn’t a concern at least for the time being. I just wanted to throw together something that worked. Also, in terms of the basic API connection I referenced this post at stackoverflow.

consumerKey = "AirdUEybmOOnaU2ctKHj6Q"
consumerSecret = "ZfmDvGS0zZrROiyVxs0T9mlbQ6M"
token = "VS_b5omGNQ6_CCBO5Ou9bDDjBYSrfWK6"
token_secret = "_deyWlngdD0ziH14bIqMMFqgMk0"</code>

require(httr)
require(httpuv)
require(jsonlite)
require(dplyr)
library(plyr); library(dplyr)

# authorization
myapp = oauth_app("YELP", key=consumerKey, secret=consumerSecret)
sig=sign_oauth1.0(myapp, token=token,token_secret=token_secret)

#state codes
states<-read.csv("States.csv", sep=",", header=FALSE)
states<-as.vector(states[,1])
#replace state codes w duplicate city names. We use
#Olympia as a proxy for Washington
states<-replace(states, states=="LA", "Louisiana")
states<-replace(states, states=="WA", "Olympia")

#initialize a list. We'll iterate through each state code,
#create a data frame of the top 20 restaurants
#then save the data frame to a list
rest_data<-list()

#iterate through all 51 states and pull the top
#20 restaurants by rating
for (i in 1:51) {
yelpurl locationdata=GET(yelpurl, sig)
locationdataContent = content(locationdata)
locationdataList=jsonlite::fromJSON(toJSON(locationdataContent))
rest_data[[i]]<-data.frame(locationdataList)
}

For this first chunk, we take care of the requisite API authorization. I then load up a vector of state codes to iterate through. I ran into a bit of a problem here, as the API confuses the state code LA for the city of Los Angeles, and the state code of WA for Washington DC. I’ve got a hacky solution here to replace LA with “Lousiana”, but as far as WA goes I’m simply using Olympia for now. You can also specify a geographical bounding box via coordinates which would be easy to do.
We then iterate through each state, do the API call, parse the results, and save each resulting data.frame to a list object.

Remember I mentioned this is old-school so you can expect many for-loops.

#names of the features we want to keep
keeps <- c("businesses.rating", "businesses.review_count",
"businesses.name", "businesses.categories", "city", "state")
#iterate through each state and build the dataframe
#of top 20 locations per state
for (i in 1:51) {
  df<-rest_data[[i]]
  #flatten the lists
  df$businesses.categories <- vapply(df$businesses.categories,
paste, collapse = ", ", character(1L))
  df$businesses.rating<-unlist(df$businesses.rating)
  df$businesses.review_count<-unlist(df$businesses.review_count)
  df$businesses.name<-unlist(df$businesses.name)
  df$city<-unlist(df$businesses.location$city)
  df$state<-unlist(df$businesses.location$state_code)
  #keep the fields we want
  df<-df[,(names(df) %in% keeps)]
  rest_data[[i]]<-df
}

The API returns a nested data.frame making binding the list impossible. So I ieterate through each DF in my rest_data list, flatten as needed, and get rid of the fields I don’t need.

#use the data.table package to bind all data frames
#for each state
top_rest<-bind_rows(rest_data)
#aggregate a new df. we will the include a col for the
#top category for each state
top_cat<-aggregate(top_rest$businesses.review_count,
list(state=top_rest$state), sum)

#now we create a new data frame going through each
#state and returning the top category
for (i in 1:51) {
  subset_state<-subset(top_rest, state==top_cat$state[i])
  cats<-paste(subset_state$businesses.categories)
  #switch to lowercase to compare strings
  cats<-tolower(cats)
  # we see many versions of american such as
tradamaerican, american (new), etc... so replace with american

  cats<-gsub("([^, ]+?american)|(american[^,]+)",
"american", cats)
  #extract only unique words from each category string,
as each string represents a resto
  cats<-vapply(lapply(strsplit(cats, " "), unique), paste,
character(1L), collapse = " ")
  #put everyithing back together in one big string
  cats<-paste(cats, collapse=' ')
  #split string by individual words into a vector
  split_cats<-unlist(strsplit(cats, split=", "))

  top_cat$top_cat[i]<-unlist(as.vector(names(sort(table(split_cats),decreasing=TRUE)[1:1])))
}

After going through and fixing each data.frame in my list, I bind everything to one data.frame using bind_rows() from the dplyr package. Maybe the only non old-school thing here. May I should have stuck to rbind() to keep with the trend. We then use aggregate to summarize by state. The resulting data.frame forms the basis for the visualization.

Now we get to the fun part. Each business record has a comma-delimited category field which looks like “American (New), Italian, Seafood, newamerican, italian, seafood”. for each restaurant I standardize the various forms of “american” to um… “american” (I admit spending way too much time on this, good thing regexr.com is fun), and remove duplicates. I then split everything into individual categories and throw everything (categories for all restaurants in that state) back into one big string. Extract the most frequently occurring category for the state, and were done.

We now have the data.frame top_cat (based on our earlier aggregation) from which we get the state code, the number of reviews for the top 20 results, and the most frequently occurring category.

  state     x            top_cat
1    AK  3172           american
2    AL  2073           american
3    AR  1691           american
4    AZ 10118           american
5    CA 42671 breakfast & brunch
6    CO 13553         sandwiches

I dumped out a csv and brought it into Tableau to map the results. Turns out “American” food is a pretty big deal. How informative!

Embedding doesn’t seem to work great so check it out at Tableau Public here.

The code pulls from the API every time it’s run, so these results will change.

The code can be found via Github here.

Yelp_API_L

Fun with the Yelp API

Passing the SAP Certified Application Associate exam for Project System (C_TPLM22_64)

So I passed my Certified Application Associate in Project System for SAP ERP (C_TPLM22_64) exam. Just in time for the November announcement on S/4, but I guess that’s how it goes with IT certifications. Project System is hardly the sexiest product in the SAP landscape at the moment, hence while studying I had a reasonably difficult time uncovering details regarding what I might expect from the exam itself, how others had prepared, what resources proved most useful, etc… so I thought, if I were to pass, I’d throw a quick post together outlining my experience.

Preparation 

First off, you’ll find the basic curriculum and practice questions provided by SAP. This pretty well represents the entirety of information you’ll find regarding the exam. Based on the curriculum, I stuck mostly to the PLM series course material. I focused on PLM200, PLM210, PLM220, and PLM230. Due to the  sub-8% weighting of the Reporting category, I skipped PLM240 altogether, assuming applied experience would get me by. This was a mistake, and I would definitely suggest reviewing PLM240. I also read Kieron Dowling’s SAP Project System Handbook.

I was quite new to PS when I sat for the exam, so my prep time will likely differ from yours. I knew I wanted to take the exam about 6 months prior to taking it, but I’d say I only got serious about studying 2 months prior, really ramping up with 1 month to go.

Strategy

Don’t waste your time looking for practice questions. You’ll find a bunch of shady sites offering so-called “brain dumps” of recent exams (which is completely against SAP’s policies), and will waste precious, constructive study time. I did sign up for an account at sapexam.com, which provides legitimate practice questions based on the PLM course materials. The exam questions were much more consulting oriented than those proved by sapexam.com, but I’d say this was still useful if for nothing more than to throw some variety into your study sessions. Just be careful not to fall into the rote memorization trap, where a false sense of confidence is achieved via answer memorization as opposed to actual understanding of the topics. In order to avoid this I kept my practice exams to a minimum, and was careful to review both correct and incorrect answers. From this perspective, sapexam did help guide my studies.

I first read through the PLM course materials quickly without making any annotations. I then went back through each with greater care, highlighting areas of greater importance. I went back through one last time and compiled notes on each course based off of the highlighted areas. Every few days I would do a mini (40 question) practice exam, and identify problem areas. I also used SCN and SAP Help for various topics, such as account assignment. I spent a great deal of time on PLM230 (Project Accounting) as this was my weakest point. From my summarized versions of each course, I made flashcards which were easy to review before bed, with my morning coffee, etc…

The flashcards were only somewhat useful. I wasted time memorizing transaction codes and definitions, when the exam itself focused much more on consulting type questions. There were no “when would you use transaction xxxx” type questions (at least in the exam I took).

Exam Day

I cleared my schedule for 2 days prior to the exam, which I had scheduled on a Friday at 10AM. I did very little studying in these 2 days, but did review problem areas and completed a full practice exam both days. Aside from that, I just relaxed. I made it to the testing centre a bit early on the day of, and was pretty well checked right in to write the exam.

The exam itself was split into categories in accordance with the curriculum provided by SAP. All questions were one of the following types:

  • True\False
  • Multiple choice, single correct answer
  • Multiple choice, multiple correct answers

Of course for those with multiple correct answers, you need to get them all in order to get it right. As I mentioned previously, the questions themselves were frequently consulting oriented, from the point of view of a client requiring particular functionality. Some were definition based. I was lucky that PLM240 had a very low weighting in the exam, because I naively believed that experience alone would suffice. Don’t make this mistake, and while you likely won’t pay as much attention to PLM240 as PLM230, you should still make sure to give it a review.

I made it through the exam quite quickly, not nearly using up the 3 hours allocated. The exam allows you to flag questions for review, and I used this function extensively. Upon completion, you can then go back through your flagged questions prior to submitting.

That’s it!

Head over to the exam invigilator’s desk and collect your printout. Hopefully you’ll see “PASS” and feel a bit of relief. This was my first certification, and a serious morale booster to succeed.

In review

  • Don’t waste time on shady “brain dump” sites. If you want to take some practice exams, sapexam.com is a decent source. Actually, for this exam they’re likely the only legitimate source. For the record, I have no affiliation with them.
  • Don’t fall into the rote memorization trap. Memorizing answers from either the PLM chapter reviews or from sapexam can give you a false sense of readiness. Make sure to focus on consulting scenarios.
  • 2 months of focused prep should be adequate, based on the strategy described above. This is also in line with comments I’ve come across on SCN. That said, significant consulting experience would cut this down.

Good luck!

Passing the SAP Certified Application Associate exam for Project System (C_TPLM22_64)

Ode to a young family

On Sunday, September 27th, 3 young children and their grandfather were killed by a drunk driver North of Toronto.

Over the next weeks, months, and even years, they’ll slowly collapse under the weight of endless platitudes. The children looked so peaceful, the grandfather so young. They’ve gone to a better place now, you’ll rejoin them in the inevitable afterlife. They surely died without suffering, succumbing at once to the impact. God called his precious angels back. Le petit jésus les aimait trop.

And for their part they will acquiesce. They will give interviews, forced to recount their horror, to endure as the mass media’s search for the elusive ratings bump uncovers details of ever-increasing poignancy. Where were they going, what were they wearing, what’s the last thing they said. Because what better than the mass extinction of a young family to draw in viewers. The outrage! He was from a rich, well to do family! In this photo, we see him seated comfortably, defiantly, in a Ferrari.

But the fact is these endless platitudes, and even the would-be heir turned mass murderer himself mean nothing. We have nothing to say. We fill the air with noises because it comforts us, not them. I can’t even listen to them because what they have to say scares me. What they represent I can’t even comprehend. Life is fickle, short, unpredictable, and meaningless. The universe owes you nothing. The young are in no way owed the future, afforded the beauty and luxury of time.

And why? Because selfishness reigns. Because we can’t get it through our fucking thick skulls that we aren’t alone on this earth, that our actions affect others. That getting behind the wheel after your last beer, just one more beer, can cause someone to wish they died along with the family you took from them. Something is wrong. But we already knew this. How many times must we be shocked into the realization?

If they stood in front of me today, I would have nothing to say. Nothing.

Ode to a young family

Trimmed Mean in R

I found myself having to compute the trimmed mean today in R, and couldn’t immediately find a function that would accomplish this end. What do I mean (ha!) by the trimmed mean?

The trimmed mean removes outliers. We define outliers as any value x such that:

LQ - 1.5 \times IQR < x < UQ + 1.5 \times IQR

Here’s our function:

#function to compute the trimmed mean
trimmed_means<-function(data_to_trim){
Outlier_UQ = quantile(data_to_trim, .75)+1.5*IQR(data_to_trim)
Outlier_LQ = quantile(data_to_trim, .25)-1.5*IQR(data_to_trim)
data_trimmed Outlier_LQ]
return(mean(data_trimmed))
}

Simple as that. You can use this in something like ddply to compare with the mean.

Using the JohnsonJohnson dataset on quarterly earnings (dollars) per Johnson & Johnson share 1960–80, we get

> mean(JohnsonJohnson)
[1] 4.799762
> trimmed_means(JohnsonJohnson)
[1] 4.523902

The trimmed mean above computing the mean on all observations -3.44<x<2.82

 

Trimmed Mean in R