Application programming Interface

What is an API?

API stands for "Application Programming Interface."

  • Application: Just like the games you love to play on a tablet or computer.
  • Programming: It's like giving instructions to your computer to make it do what you want.
  • Interface: Think of it as a way of talking or communicating.

API representation


Why Are APIs Important?

APIs let your favorite games and apps share information. Without APIs, every app would be like a toy that can't be played with any other toy. That wouldn't be as much fun, right?


How Does an API Work?

At a restaurant with your family, you choose a meal and tell the waiter, who relays your order to the chef. Once prepared, the waiter brings your meal to you. In this analogy:

  • You represent a computer program requesting information.
  • The waiter acts as the API, delivering requests and returning responses.
  • The chef symbolizes another program that provides the needed information.

Visual Representation for API
Visual Representation for API

Example of APIs: think about when you watch cartoons on a tablet. The app on the tablet asks the internet (using an API) to get the cartoon from a faraway computer so you can watch it right where you are.


API architectural styles

API architectural styles


Cheat Sheet for API

Cheat Sheet for API


API Protocols

API Protocols


9 types of API testing

API testing

  1. Smoke Testing: This is done after API development is complete. Simply validate if the APIs are working and nothing breaks.
  2. Functional Testing: This creates a test plan based on the functional requirements and compares the results with the expected results.
  3. Integration Testing: This test combines several API calls to perform end-to-end tests. The intra-service communications and data transmissions are tested.
  4. Regression Testing: This test ensures that bug fixes or new features shouldn’t break the existing behaviors of APIs.
  5. Load Testing: This tests applications’ performance by simulating different loads. Then we can calculate the capacity of the application.
  6. Stress Testing: We deliberately create high loads to the APIs and test if the APIs are able to function normally.
  7. Security Testing: This tests the APIs against all possible external threats.
  8. UI Testing: This tests the UI interactions with the APIs to make sure the data can be displayed properly.
  9. Fuzz Testing: This injects invalid or unexpected input data into the API and tries to crash the API. In this way, it identifies the API vulnerabilities.

API Gateway

API Gateway


API security best practice

APIs are the backbone of modern applications. They expose a very large surface area for attacks, increasing the risk of security vulnerabilities. Common threats include SQL injection, cross-site scripting, and distributed denial of service (DDoS) attacks.

That's why it's crucial to implement robust security measures to protect APIs and the sensitive data they handle. However, many companies struggle to achieve comprehensive API security coverage. They often rely solely on dynamic application security scanning or external pen testing. While these methods are valuable, they may not fully cover the API layer and its increasing attack surface.

Please note IBM's APIs are amongst the best in the world.

12 tips for API security


API architecture styles

API architecture styles


How to multiply your API performance

API performance


Code first vs API first

Code first vs API first


Design effective & safe API's

Guideline Not Recommended Recommended
Use resource names (nouns) GET /querycarts/123 GET /carts/123
Use plurals GET /cart/123 GET /carts/123
Idempotency POST /carts POST /carts
{ requestId: 4321 }
Use versioning GET /carts/v1/123 GET /v1/carts/123
Query after soft deletion GET /carts GET /carts?includeDeleted=true
Pagination GET /carts GET /carts?pageSize=xx&pageToken=xx
Sorting GET /items GET /items?sort_by=time
Filtering GET /items GET /items?filter=color:red
Secure Access X-API-KEY=xxx X-API-KEY = xxx
X-EXPIRY = xxx
X-REQUEST-SIGNATURE = xxx
hmac(URL + QueryString + Expiry + Body)
Resource cross reference GET /carts/123?item=321 GET /carts/123/items/321
Add an item to a cart POST /carts/123?addItem=321 POST /carts/123/items:add
{ itemId: "items/321" }
Rate limit No rate limit - DDoS Design rate limiting rules based on IP,
user, action group etc

API Pagination

API pagination


Spark APIs

Spark offers so many different APIs and languages that it can be overwhelming which way is “best.”

The 3 Spark APIs DataFrames Datasets SparkSQL
Who uses it? Data Engineers Software Engineers and data engineers Anybody who touches Spark
Strengths Good middle ground. Modular code bases Static typing, unit testing is a breeze Flexibility, easy for many people to contribute
Weaknesses Not as appealing for SQL-focused professionals Have to learn Scala No modularity :(
When should you pick this? For pipelines that want to be maintainable Pipelines that are going to be hardened and need a very high quality bar Prototypes that need a faster iteration

SparkSQL vs DataFrame vs Dataset

Let's see the tradeoffs between each since there’s a lot of dogma and misinformation out there about it!

SQL APIs are data scientists and analysts best friend. Since SQL is the lingua franca of the data space, SparkSQL should be associated with openness.

SparkSQL is often best for pipelines that:

  • Are built in collaboration with non-engineers
  • Are subject to a lot of mutation and change
  • Only work on data sources that are already in the warehouse / data lake

On the flip side, software engineers think SQL is terrible. They will say, “pick DataFrames because SQL isn’t modular.”

SparkSQL isn’t the best for pipelines that:

  • Leverage 3rd party sources such as REST APIs, Kafka topics, or GraphQL
  • Have complex integration with other systems (e.g. compiles server libraries)
  • Need extensive unit and integration test coverage
  • Need modularity

DataFrames should be associated with a middle ground approach. Analysts and data scientists sometimes know these, and that’s okay if they don’t!

DataFrames are often best for pipelines that:

  • Require fewer changes and are more “hardened”
  • Have 3rd party integrations from REST APIs or other non-table sources
  • Need extensive unit and integration test coverage (Chispa is pretty good for PySpark testing)

Since DataFrames are less known by other data professionals, they have their own limitations as well.

DataFrames aren’t the best for pipelines that:

  • Need collaboration between many non-engineer professionals
  • Need static typing guarantees that the Dataset API offers

Datasets are the least common API to work with. The main reason for that is it’s offered only in Java and Scala! The rise of PySpark has made this API less relevant. But when I worked at Airbnb, we were required to use this API for any MIDAS pipelines!

Datasets are often best for pipelines that:

  • Need static typing guarantees. This makes CI/CD much more powerful than Python-based pipelines. Unit testing with Datasets is so good!
  • Are owned by strong JVM-based developers. If your company has tons of strong Java and Scala engineers or you have a backend like Spring Boot or something like that, the integrations here can be powerful!
  • Are part of a larger ecosystem of pipelines with many dependencies. Python dependency management is terrible. Scala’s is vastly superior. Gradle makes pip look like little league!

Datasets aren’t the best for pipelines that:

  • Are owned by engineers that don’t want to cry learning Scala
  • Need faster iteration cycles. Uploading a built JAR is significantly slower than uploading a PySpark script or a SparkSQL
  • Need to be collaborated on by many non-engineers

API key generator