Exploring The Merriam-Webster API

31 Jan 2022

“Words mean things”

When you read policies or contracts all day, you tend to learn the value of nuance. For example the word “will” and “shall” have very specific meanings in a contract. While the word “shall” sounds less commanding than “will”, “shall” is actually a legally binding term meaning “you have a duty to do so”. According to the University of Texas, “shall is the most misused word in all of legal language.”

apt update english

How much effort really goes into defining words?

A lot apparently.

Merriam-Webster, the renowned English Dictionary, updates their online index daily, and the task itself does not require scholars or academics. Merriam-Webster has made it clear that their employees simply read a “cross-section of published materials” to determine the frequency that a new word is used. After a certain threshold of uses, they add the term to their dictionary, making sure to include citations and examples of how the word is spoken or written. If you go to their online dictionary, you can look up this information on any term. Even if you open a copy of their dictionary from twenty years ago, the citations and exmples were included, especially for newer terms.

The Dictionary Application Programming Interface (API)

Merriam-Webster’s Dictionary API is exactly what you would expect from a 21st Century Dictionary company. Like most organizations offering free data, first you have to register for a Developer account to get an API token. Once you’ve got a token, you can query any term (like “murmurously”), even without having to write a program. All you have to do is craft a URL into your address bar:

URL STRUCTURE:

"https://www.dictionaryapi.com/api/v3/references/collegiate/json/" + any_word + "?key=" + your_api_key

EXAMPLE:

https://www.dictionaryapi.com/api/v3/references/collegiate/json/murmurously?key=/

You should receive back a response that looks something like this:

RESULT:

[{"meta":{"id":"murmurous","uuid":"6d150be3-9dbd-40a1-8a8c-8d2cacd16761","sort":"134837000","src":"collegiate","section":"alpha","stems":["murmurous","murmurously"],"offensive":false},"hwi":{"hw":"mur*mur*ous","prs":[{"mw":"\u02c8m\u0259r-m\u0259-r\u0259s","sound":{"audio":"murmur02","ref":"c","stat":"1"}},{"mw":"\u02c8m\u0259rm-r\u0259s"}]},"fl":"adjective","def":[{"sseq":[[["sense",{"dt":[["text","{bc}filled with or characterized by {a_link|murmurs} {bc}low and indistinct"]]}]]]}],"uros":[{"ure":"mur*mur*ous*ly","fl":"adverb"}],"date":"1582","shortdef":["filled with or characterized by murmurs : low and indistinct"]}]

Parsing the API’s JSON Responses

To make sense of the data structures, I tend to parse JSON in Python3. In order to automate this process and make sense of the API data, I needed to do the following:

Craft the API’s URL
Submit a GET request for the URL
Receive and evaluate the API response
Parse the response as a JSON
Enumerate over the JSON indices to extract data
Save the results to file(s)

Here’s that short Python3 script I wrote:

#!/bin/python3

import json
import requests

word = "innovation"
api_key = "12345678-9abc-defg-hijk-lmnopqrstuvw"

#############
# Variables #
#############
def get_word(url):
    # Request the URL using the API key
    print("Requesting " + url)
    url = url + "?key=" + api_key
    #send a get request to the URL we crafted
    page = requests.get(url)
    # Notify us if server response gives bad status code
    if page.status_code != 200:
        print("Request failed")
    # Convert server response to JSON
    response = json.loads(page.text)
    # tell the user how big the object is
    print("JSON items: " + str(len(response)))
    # return the response value to the calling function
    return response

def pretty(input):
    # convert JSON input to a pretty and human-readable format
    output = json.dumps(input, indent=2)
    # return the response value to the calling function
    return output

########
# MAIN #
########
# do not change this unless you are working in a different dictionary
url  = "https://www.dictionaryapi.com/api/v3/references/collegiate/json/" + word

# call our method above using the URL we crafted.
response = get_word(url)

# Do the following for every item in the JSON response
for i in range(0, len(response)):
    # iteratively make the data look good
    result = pretty(response[i])
    # iteratively craft a filename
    filename = "output" + str(i) + ".json"
    # iteratively create a new file with our filename
    outfile = open(filename,"w")
    # iteratively save our JSON data to the file
    outfile.write(result)
    # iteratively close the file so we can open it elsewhere
    outfile.close()

The resulting output for this script should be as follows:

{
  "meta": {
    "id": "innovation",
    "uuid": "9b912158-88bc-4c78-8b64-baff5c81d396",
    "sort": "091741000",
    "src": "collegiate",
    "section": "alpha",
    "stems": [
      "innovation",
      "innovational",
      "innovations"
    ],
    "offensive": false
  },
  "hwi": {
    "hw": "in*no*va*tion",
    "prs": [
      {
        "mw": "\u02cci-n\u0259-\u02c8v\u0101-sh\u0259n",
        "sound": {
          "audio": "innova04",
          "ref": "c",
          "stat": "1"
        }
      }
    ]
  },
  "fl": "noun",
  "def": [
    {
      "sseq": [
        [
          [
            "sense",
            {
              "sn": "1",
              "dt": [
                [
                  "text",
                  "{bc}a new idea, method, or device {bc}{sx|novelty||}"
                ]
              ]
            }
          ]
        ],
        [
          [
            "sense",
            {
              "sn": "2",
              "dt": [
                [
                  "text",
                  "{bc}the introduction of something new"
                ]
              ]
            }
          ]
        ]
      ]
    }
  ],
  "uros": [
    {
      "ure": "in*no*va*tion*al",
      "prs": [
        {
          "mw": "\u02cci-n\u0259-\u02c8v\u0101-shn\u0259l",
          "sound": {
            "audio": "innova05",
            "ref": "c",
            "stat": "1"
          }
        },
        {
          "mw": "-sh\u0259-n\u1d4al"
        }
      ],
      "fl": "adjective"
    }
  ],
  "date": "15th century{ds||2||}",
  "shortdef": [
    "a new idea, method, or device : novelty",
    "the introduction of something new"
  ]
}

Visualizing the chaos

For this project, I wrote a scripts to turn the JSON structure into a useful hierarchy chart. Along the way, I accidentally made a new Python module called json4tree. If you’d like to learn more about it, check out the source code on GitHub. Here’s the code I used to convert the JSON for my D3 visualization:

#!/bin/python3

# import the necessary modules
import json
import json4tree

# import your JSON data
infile = open("input.json", "r")
json_file = json.load(infile)
infile.close()

# create a new handler
converter = json4tree.handler(json_file)

# You can either print your results...
converter.results

# Or you can save your results
outfile = open("output.json", "w")
outfile.write(converter.results)
outfile.close()

Here’s a sample of what the resulting JSON file looks like:

{
  "name": "query",
  "children": [
    {
      "name": "meta",
      "children": [
        {
          "name": "id",
          "value": "innovation",
          "type": "str"
        },
        {"..."}
      ]
    }
  ]
}

And here’s my final product – a D3 hierarchy-based collapsible tree:

go ahead, click on the dots

Special Thanks

I want to give a special thanks to several organizations and people that made this writeup possible.