Adding Postgres and ActiveRecord
Getting started - adding dependencies to your project
In this guide, we’ll look at some features of Foursquare and Twitter’s API
Before we begin to explore these APIs. Let’s step back and consider you’re building a new social platform, like Twitter or Foursquare. You want to open up the functionality data for other developers to hook into, so you’re going to create an API for your platform. However, creating an API for your webservice has its pro’s and con’s. Pro: It gives access to developers everywhere to a tonne of useful data to build awesome things. Con: it could be abused. Someone could make thousands of calls a second to download/scrape all of the data.
Because of this, services that open up their data with an API often try to balance the openness with a little bit of knowledge about who is using the service. This protects the service from abuse and protects users because everyone who’s accessing data and functionality through the API is known.
They do this by requiring anyone who’s using the API to have an API key. Just as a real-world key allows you to access something, an API key grants you access to a particular API. Moreover, an API key identifies you to the API, which helps the API provider keep track of how their service is used and prevent unauthorized or malicious activity.
API keys are often long alphanumeric strings. We’ve made one up in the editor to the right! (It won’t actually work on anything, but when you receive your own API keys in future projects, they’ll look a lot like this.)
api_key = "FtHwuH8w1RDjQpOr0y0gF3AWm8sRsRzncK3hHh9"
The API key can usually be created in the developer section of the platform documentation and is unique to your user account on the platform.
To do this on Foursquare, we need to create an App, to get a client key and secret.
In addition to wanting to know what ‘applications’ or developers are using the API and how, major social platforms also might need to know which user you want data for. For example, if you were using the API to access timeline information and see posts from followed users, you would want to be able to tell the API which user you’re logging into the API as, right?
This is baked into most APIs, and if you look at the API documentation it will tell you which methods (or endpoints) need you to tell it which user is which.
When we want to sign in as a particular user, you’ll ‘authenticate’ using a protocol called OAuth. We won’t get into the details, but if you’ve ever been redirected to a page asking for permission to link an application with your account, you’ve probably used OAuth.
Twitter provides this nice explanation of what OAuth is:
OAuth is an authentication protocol that allows users to approve application to act on their behalf without sharing their password. More information can be found at oauth.net or in the excellent Beginner’s Guide to OAuth from Hueniverse.
Oauth is a stepped process. You need to provide a few bits of information - the major thing to know is that the Service provider acts as middleman so that at no point does your code, service or application have direct access to the user’s information (username and password). It’s designed to protect users from having to share that information with third parties.
Here’s the rough process of what happens.
Thankfully for at least twitter, you don’t need to navigate this process, you just need to grab an access token for your account through the developer portal. See: https://dev.twitter.com/oauth/overview/application-owner-access-tokens
Another feature of most API’s is rate limits. Rate limits prevent a single user of an API making hundreds if not thousands of API calls in a short space of time. Simply put, it prevents you abusing the API and making sure you only gather the information you really need.
Twitter, foursquare and instagram have thousands of third parties all making requests to the API at the same time. To balance the demand, to make sure their servers can handle the capacity, and to make sure no one third party gets priority over access, rate limits ensure everyone has equal access and priority in getting the information they want.
Rate limits are clearly indicated in the documentation for each endpoint, but keep in mind that they often vary from end point to end point.
Have you noticed, most of the API calls give you historical information? You move through page after page peeling back from the most recent to the oldest information. This means you’ve got to have a starting point and work backwards, but want if you want to gather stuff AS IT HAPPENS? Great question.
Many API’s also provide a ‘streaming API’. Instead of making calls to an endpoint and getting information that’s already posted / updated / pinned or shared, the streaming API keeps a connection to your software open and shares updates as it happens. It pushes updates right to your program.
This means you get a lot more data, a lot faster, but you’ve also got to be able to handle that and process it really fast. This is why its called a ‘firehose’ - it really does open up a deluge of information.
For 99% of the things you do, standard APIs will be the best source, but in some instances you’ll want more and richer data. If you want to access this, you need to plan for it carefully. The good news is that you don’t have to worry abour rate limits. The bad news is that sometimes you need to justify why you need access to the streaming API and it can take time to get that access.
They work a little differently and if you’re interested you should look at Twitter’s information on Streaming or Realtime API’s as a starting point