/stopwords/

The /stopwords/ endpoint serves stopwords lists. These lists can be useful when parameterizing the scoring algorithm at the /parallels/ endpoint.

Stopwords lists are typically computed using frequency information. The reasoning is that the most frequent features are typically the least informative (consider, for example, the articles in English).

Of course, depending on your needs, you may choose to exclude certain words recommended by the /stopwords/ endpoint or include words that were not recommended. These decisions can be made when specifying stopwords in your application.

GET

Requesting GET at /stopwords/ provides a JSON object containing a stopwords list. How this stopwords list was created is dependent on the URL query fields used.

By default, a GET at /stopwords/ returns a JSON object containing an empty list.

Request

The following fields may be used in a URL query to specify the parameters by which the stopwords list is created:

Field Name Field Value
feature A string specifying the linguistic feature by which frequencies are calculated; lemmata is the default.
list_size An integer specifying the number of stopwords to include in the stopwords list. 10 is the default.
language A string specifying one of the languages in the Tesserae database; all works in that language will be used to determine feature frequencies.
works A percent-encoded string of the form <object_id 1>,<object_id 2>,..., specifying which works are used to determine feature frequencies.

In the case that both works and language are specified, the language option will take precedence.

Response

On success, the response data payload will contain a JSON object with the key "stopwords", associated with a list of strings.

On failure, the data payload contains error information in a JSON object with the following keys:

Key Value
"data" A JSON object whose keys are the received URL query fields, associated with percent-decoded values.
"message" A string explaining why the request data payload was rejected.

Examples

Get the 10 Highest Frequency Lemmata in Latin

Request:

curl -i -X GET "https://tesserae.caset.buffalo.edu/api/stopwords/?language=latin"

Response:

HTTP/1.1 200 OK
...

{ 
  "stopwords": [
    ...
  ]
}

Get the 20 Highest Frequency Lemmata in Latin

Request:

curl -i -X GET "https://tesserae.caset.buffalo.edu/api/stopwords/?language=latin&list_size=20"

Response:

HTTP/1.1 200 OK
...

{ 
  "stopwords": [
    ...
  ]
}

Get the 15 Highest Frequency Lemmata in Two Specific Texts

Assume that 5c6c69f042facf59122418f6 and 5c6c69f042facf59122418f8 are object IDs of texts in the database.

Request:

curl -i -X GET "https://tesserae.caset.buffalo.edu/api/stopwords/?works=5c6c69f042facf59122418f6%2C5c6c69f042facf59122418f8&list_size=15"

Response:

HTTP/1.1 200 OK
...

{ 
  "stopwords": [
    ...
  ]
}

Attempt to Get a Stopwords List with a Text Not in the Database

Suppose no text has the identifier DEADBEEFDEADBEEFDEADBEEF.

Request:

curl -i -X GET "https://tesserae.caset.buffalo.edu/api/stopwords/?works=DEADBEEFDEADBEEFDEADBEEF&list_size=15"

Response:

HTTP/1.1 400 Bad Request
...

{
  "data": {
    "works": ["DEADBEEFDEADBEEFDEADBEEF"],
    "list_size": 15
  },
  "message": "No text can be found with the identifier provided (DEADBEEFDEADBEEFDEADBEEF)."
}