HNSearch API

The HNSearch API enables developers to access HN data programatically via simple HTTP requests. This documentation describes how to request data from the API and how to interpret the response.

If you have any questions about the API you can chat with us at our Developer Forum or send us an email (hnsearch@thriftdb.com).

Overview

The HNSearch webapp is a simple javascript app that sends queries directly to the api.hnsearch.com bucket at ThriftDB. Since the hnsearch bucket is public, the ThriftDB REST API will act as the HNSearch API itself.

Core Concepts

Requests and data formats

All requests to the HNSearch API are simple HTTP GET requests. Data collections can be accessed at:

http://api.thriftdb.com/api.hnsearch.com/<collection>

Currently, all responses are returned in JSON format.

Collections

Items - http://api.thriftdb.com/api.hnsearch.com/items

  • id - The item's unique integer id (not searchable)
  • parent_id - The parent comment's id
  • parent_sigid - The parent comment's signed id
  • points - The number of points
  • username - The submitter's username
  • type - Item type (submission|comment)
  • url - A submission url
  • domain - A submission url's domain name
  • title - A submission title
  • num_comments - Number of submission comments
  • text - The submission/comment content
  • discussion{} - A comment's parent discussion
    • id - The discussion's item id (not searchable)
    • sigid - The discussion's signed id
    • title - The discussion's item title (not searchable)
  • create_ts - When the item was created
  • cache_ts - When the item was last cached

Users - http://api.thriftdb.com/api.hnsearch.com/users

  • username - The user's unique name
  • about - The user's bio
  • karma - The number of points a user has
  • create_ts - When the user was created
  • cache_ts - When the user object was last cached

Errors

The HNSearch API returns appropriate HTTP status codes for API requests. In particular it returns 503 (Service Unavailable) status code when the service is down for maintenance. Your application should handle server errors gracefully in case there is planned downtime or an unexpected failure.

REST API Methods

For a full list of ThriftDB REST API methods and arguments please see the ThriftDB REST API documentation.

Search API

For a full explanation of the ThriftDB Search API please see the ThriftDB Search API documentation. The following is a summary of the arguments and response objects for the search endpoint:

http://api.thriftdb.com/api.hnsearch.com/<collection>/_search

URL Arguments

Argument Datatype Default
q string none
start integer 0
limit integer 10
sortby string 'score desc'
filter[fields][<fieldname>][] string none
filter[queries][] string none
facet[fields][<fieldname>][include] boolean false
facet[fields][<fieldname>][exclude_filter] boolean true
facet[fields][<fieldname>][start] integer 0
facet[fields][<fieldname>][limit] integer 10
facet[queries][] string none
weights[<fieldname>] float 1.0
boosts[fields][<fieldname>] float none
boosts[filters][<filterstring>] float none
boosts[functions][<functionstring>] float none
highlight[markup_items] boolean false
highlight[include_matches] boolean false

Response Object

  • hits - The total number of matched items
  • time - The amount of time it took to process the search request
  • request{} - The request parameters used by the server
    • q
    • start
    • limit
    • sortby
    • filter{}
      • fields{<fieldname>:<fieldvalues>[]}
      • queries[]
    • facet{}
      • fields{}
        • <fieldname>{}
          • include
          • start
          • limit
          • exclude_filter
      • queries[]
    • weights{<fieldname>:<weightvalue>}
    • boosts{}
      • fields{<fieldname>:<boostfactor>}
      • filters{<filterstring>:<boostfactor>}
      • functions{<functionstring>:<boostfactor>}
    • highlight{}
      • markup_items
      • include_matches
  • results[]{} - Sorted list of matched items and highlights
    • item{} - A matched item
    • score
  • facet_results{}
    • fields
      • <fieldname>{}
        • facets[]{}
          • value - The value of the facet
          • count - The number of matched items
    • queries{<queryvalue>:<querycount>}

Ranking Algorithm

ThriftDB lets you use field weights and numeric attributes to influence an item's match score. It also lets you boost by more complicated mathematical functions.

The webapp at hnsearch.com uses this combination of field weights, field boosts, and the Hacker News hotness algorithm to rank results:

    "weights": {
        "title"   : 1.1,
        "text"    : 0.7,
	"url"     : 1.0,
	"domain"  : 2.0,
        "username": 0.1,
        "type"    : 0.0
    },
    "boosts": {
        "fields": {
            "points"      : 0.15,
            "num_comments": 0.15
        },
        "functions": {
	    "pow(2,div(div(ms(create_ts,NOW),3600000),72))": 200.0
	}
    }
  

Here's an example search url:

http://api.thriftdb.com/api.hnsearch.com/items/_search?q=facebook&weights[title]=1.1...

Developers are encouraged to use their own ranking algorithms to rank results. For a more thorough explanation of how match scores are calculated, developers can consult the Lucene scoring documentation.