Fun with the Yelp API

This post details my first attempt at using the Yelp API with R (and Tableau). Going in, I wasn’t quite sure in which direction I wanted to go. I had some big ideas but quickly came to the realization that I should come up with something simple first to figure things out. The search API has one major limitation: it returns a maximum of 20 records per search. This is pretty unkind to “big” ideas. So I decided to go with this:

  • Pull the top 20 highest rated “food” establishments by state
  • Each restaurant has a category. I’d look at the most frequently occurring category in the 20 records returned for a given state, and map the results in Tableau.

Pretty useless, and pretty straightforward to get my feet wet. A couple caveats: the code is ugly, and pretty well 100% old school R. I’m certain there are better ways to go about this, but efficiency wasn’t a concern at least for the time being. I just wanted to throw together something that worked. Also, in terms of the basic API connection I referenced this post at stackoverflow.

consumerKey = "AirdUEybmOOnaU2ctKHj6Q"
consumerSecret = "ZfmDvGS0zZrROiyVxs0T9mlbQ6M"
token = "VS_b5omGNQ6_CCBO5Ou9bDDjBYSrfWK6"
token_secret = "_deyWlngdD0ziH14bIqMMFqgMk0"</code>

require(httr)
require(httpuv)
require(jsonlite)
require(dplyr)
library(plyr); library(dplyr)

# authorization
myapp = oauth_app("YELP", key=consumerKey, secret=consumerSecret)
sig=sign_oauth1.0(myapp, token=token,token_secret=token_secret)

#state codes
states<-read.csv("States.csv", sep=",", header=FALSE)
states<-as.vector(states[,1])
#replace state codes w duplicate city names. We use 
#Olympia as a proxy for Washington
states<-replace(states, states=="LA", "Louisiana")
states<-replace(states, states=="WA", "Olympia")

#initialize a list. We'll iterate through each state code, 
#create a data frame of the top 20 restaurants
#then save the data frame to a list
rest_data<-list()

#iterate through all 51 states and pull the top 
#20 restaurants by rating
for (i in 1:51) {
yelpurl locationdata=GET(yelpurl, sig)
locationdataContent = content(locationdata)
locationdataList=jsonlite::fromJSON(toJSON(locationdataContent))
rest_data[[i]]<-data.frame(locationdataList)
}

For this first chunk, we take care of the requisite API authorization. I then load up a vector of state codes to iterate through. I ran into a bit of a problem here, as the API confuses the state code LA for the city of Los Angeles, and the state code of WA for Washington DC. I’ve got a hacky solution here to replace LA with “Lousiana”, but as far as WA goes I’m simply using Olympia for now. You can also specify a geographical bounding box via coordinates which would be easy to do.
We then iterate through each state, do the API call, parse the results, and save each resulting data.frame to a list object.

Remember I mentioned this is old-school so you can expect many for-loops.

#names of the features we want to keep
keeps <- c("businesses.rating", "businesses.review_count", 
"businesses.name", "businesses.categories", "city", "state")
#iterate through each state and build the dataframe 
#of top 20 locations per state
for (i in 1:51) {
  df<-rest_data[[i]]
  #flatten the lists
  df$businesses.categories <- vapply(df$businesses.categories, 
paste, collapse = ", ", character(1L))
  df$businesses.rating<-unlist(df$businesses.rating)
  df$businesses.review_count<-unlist(df$businesses.review_count)
  df$businesses.name<-unlist(df$businesses.name)
  df$city<-unlist(df$businesses.location$city)
  df$state<-unlist(df$businesses.location$state_code)
  #keep the fields we want
  df<-df[,(names(df) %in% keeps)]
  rest_data[[i]]<-df
}

The API returns a nested data.frame making binding the list impossible. So I ieterate through each DF in my rest_data list, flatten as needed, and get rid of the fields I don’t need.

#use the data.table package to bind all data frames 
#for each state
top_rest<-bind_rows(rest_data)
#aggregate a new df. we will the include a col for the 
#top category for each state
top_cat<-aggregate(top_rest$businesses.review_count, 
list(state=top_rest$state), sum)

#now we create a new data frame going through each 
#state and returning the top category
for (i in 1:51) {
  subset_state<-subset(top_rest, state==top_cat$state[i])
  cats<-paste(subset_state$businesses.categories)
  #switch to lowercase to compare strings
  cats<-tolower(cats)
  # we see many versions of american such as 
tradamaerican, american (new), etc... so replace with american
  
  cats<-gsub("([^, ]+?american)|(american[^,]+)", 
"american", cats)
  #extract only unique words from each category string, 
as each string represents a resto
  cats<-vapply(lapply(strsplit(cats, " "), unique), paste, 
character(1L), collapse = " ")
  #put everyithing back together in one big string
  cats<-paste(cats, collapse=' ')
  #split string by individual words into a vector
  split_cats<-unlist(strsplit(cats, split=", "))
  
  top_cat$top_cat[i]<-unlist(as.vector(names(sort(table(split_cats),decreasing=TRUE)[1:1])))
}

After going through and fixing each data.frame in my list, I bind everything to one data.frame using bind_rows() from the dplyr package. Maybe the only non old-school thing here. May I should have stuck to rbind() to keep with the trend. We then use aggregate to summarize by state. The resulting data.frame forms the basis for the visualization.

Now we get to the fun part. Each business record has a comma-delimited category field which looks like “American (New), Italian, Seafood, newamerican, italian, seafood”. for each restaurant I standardize the various forms of “american” to um… “american” (I admit spending way too much time on this, good thing regexr.com is fun), and remove duplicates. I then split everything into individual categories and throw everything (categories for all restaurants in that state) back into one big string. Extract the most frequently occurring category for the state, and were done.

We now have the data.frame top_cat (based on our earlier aggregation) from which we get the state code, the number of reviews for the top 20 results, and the most frequently occurring category.

  state     x            top_cat
1    AK  3172           american
2    AL  2073           american
3    AR  1691           american
4    AZ 10118           american
5    CA 42671 breakfast & brunch
6    CO 13553         sandwiches

I dumped out a csv and brought it into Tableau to map the results. Turns out “American” food is a pretty big deal. How informative!

Embedding doesn’t seem to work great so check it out at Tableau Public here.

The code pulls from the API every time it’s run, so these results will change.

The code can be found via Github here.

Yelp_API_L

Advertisements
Fun with the Yelp API

Passing the SAP Certified Application Associate exam for Project System (C_TPLM22_64)

So I passed my Certified Application Associate in Project System for SAP ERP (C_TPLM22_64) exam. Just in time for the November announcement on S/4, but I guess that’s how it goes with IT certifications. Project System is hardly the sexiest product in the SAP landscape at the moment, hence while studying I had a reasonably difficult time uncovering details regarding what I might expect from the exam itself, how others had prepared, what resources proved most useful, etc… so I thought, if I were to pass, I’d throw a quick post together outlining my experience.

Preparation 

First off, you’ll find the basic curriculum and practice questions provided by SAP. This pretty well represents the entirety of information you’ll find regarding the exam. Based on the curriculum, I stuck mostly to the PLM series course material. I focused on PLM200, PLM210, PLM220, and PLM230. Due to the  sub-8% weighting of the Reporting category, I skipped PLM240 altogether, assuming applied experience would get me by. This was a mistake, and I would definitely suggest reviewing PLM240. I also read Kieron Dowling’s SAP Project System Handbook.

I was quite new to PS when I sat for the exam, so my prep time will likely differ from yours. I knew I wanted to take the exam about 6 months prior to taking it, but I’d say I only got serious about studying 2 months prior, really ramping up with 1 month to go.

Strategy

Don’t waste your time looking for practice questions. You’ll find a bunch of shady sites offering so-called “brain dumps” of recent exams (which is completely against SAP’s policies), and will waste precious, constructive study time. I did sign up for an account at sapexam.com, which provides legitimate practice questions based on the PLM course materials. The exam questions were much more consulting oriented than those proved by sapexam.com, but I’d say this was still useful if for nothing more than to throw some variety into your study sessions. Just be careful not to fall into the rote memorization trap, where a false sense of confidence is achieved via answer memorization as opposed to actual understanding of the topics. In order to avoid this I kept my practice exams to a minimum, and was careful to review both correct and incorrect answers. From this perspective, sapexam did help guide my studies.

I first read through the PLM course materials quickly without making any annotations. I then went back through each with greater care, highlighting areas of greater importance. I went back through one last time and compiled notes on each course based off of the highlighted areas. Every few days I would do a mini (40 question) practice exam, and identify problem areas. I also used SCN and SAP Help for various topics, such as account assignment. I spent a great deal of time on PLM230 (Project Accounting) as this was my weakest point. From my summarized versions of each course, I made flashcards which were easy to review before bed, with my morning coffee, etc…

The flashcards were only somewhat useful. I wasted time memorizing transaction codes and definitions, when the exam itself focused much more on consulting type questions. There were no “when would you use transaction xxxx” type questions (at least in the exam I took).

Exam Day

I cleared my schedule for 2 days prior to the exam, which I had scheduled on a Friday at 10AM. I did very little studying in these 2 days, but did review problem areas and completed a full practice exam both days. Aside from that, I just relaxed. I made it to the testing centre a bit early on the day of, and was pretty well checked right in to write the exam.

The exam itself was split into categories in accordance with the curriculum provided by SAP. All questions were one of the following types:

  • True\False
  • Multiple choice, single correct answer
  • Multiple choice, multiple correct answers

Of course for those with multiple correct answers, you need to get them all in order to get it right. As I mentioned previously, the questions themselves were frequently consulting oriented, from the point of view of a client requiring particular functionality. Some were definition based. I was lucky that PLM240 had a very low weighting in the exam, because I naively believed that experience alone would suffice. Don’t make this mistake, and while you likely won’t pay as much attention to PLM240 as PLM230, you should still make sure to give it a review.

I made it through the exam quite quickly, not nearly using up the 3 hours allocated. The exam allows you to flag questions for review, and I used this function extensively. Upon completion, you can then go back through your flagged questions prior to submitting.

That’s it!

Head over to the exam invigilator’s desk and collect your printout. Hopefully you’ll see “PASS” and feel a bit of relief. This was my first certification, and a serious morale booster to succeed.

In review

  • Don’t waste time on shady “brain dump” sites. If you want to take some practice exams, sapexam.com is a decent source. Actually, for this exam they’re likely the only legitimate source. For the record, I have no affiliation with them.
  • Don’t fall into the rote memorization trap. Memorizing answers from either the PLM chapter reviews or from sapexam can give you a false sense of readiness. Make sure to focus on consulting scenarios.
  • 2 months of focused prep should be adequate, based on the strategy described above. This is also in line with comments I’ve come across on SCN. That said, significant consulting experience would cut this down.

Good luck!

Passing the SAP Certified Application Associate exam for Project System (C_TPLM22_64)

Ode to a young family

On Sunday, September 27th, 3 young children and their grandfather were killed by a drunk driver North of Toronto.

Over the next weeks, months, and even years, they’ll slowly collapse under the weight of endless platitudes. The children looked so peaceful, the grandfather so young. They’ve gone to a better place now, you’ll rejoin them in the inevitable afterlife. They surely died without suffering, succumbing at once to the impact. God called his precious angels back. Le petit jésus les aimait trop.

And for their part they will acquiesce. They will give interviews, forced to recount their horror, to endure as the mass media’s search for the elusive ratings bump uncovers details of ever-increasing poignancy. Where were they going, what were they wearing, what’s the last thing they said. Because what better than the mass extinction of a young family to draw in viewers. The outrage! He was from a rich, well to do family! In this photo, we see him seated comfortably, defiantly, in a Ferrari.

But the fact is these endless platitudes, and even the would-be heir turned mass murderer himself mean nothing. We have nothing to say. We fill the air with noises because it comforts us, not them. I can’t even listen to them because what they have to say scares me. What they represent I can’t even comprehend. Life is fickle, short, unpredictable, and meaningless. The universe owes you nothing. The young are in no way owed the future, afforded the beauty and luxury of time.

And why? Because selfishness reigns. Because we can’t get it through our fucking thick skulls that we aren’t alone on this earth, that our actions affect others. That getting behind the wheel after your last beer, just one more beer, can cause someone to wish they died along with the family you took from them. Something is wrong. But we already knew this. How many times must we be shocked into the realization?

If they stood in front of me today, I would have nothing to say. Nothing.

Ode to a young family