Resource URIs
Use yelp as a database/sql-style driver so a host program can address Yelp as yelp:// URIs.
yelp is a command line, but the yelp Go package is also a small driver that
makes Yelp addressable as a resource URI. A host program registers it the way a
program registers a database driver with database/sql, then dereferences
yelp:// URIs without knowing anything about how Yelp is fetched.
The host that does this today is ant, a single
binary that puts one URI namespace over a family of site tools. The examples
below use ant; any program that links the package gets the same behavior.
Mounting the driver
A host enables the driver with one blank import, exactly like
import _ "github.com/lib/pq":
import _ "github.com/tamnd/yelp-cli/yelp"
The package's init registers a domain with the scheme yelp for the hosts
www.yelp.com and yelp.com. The standalone yelp binary does not change.
Addressing records
A URI is scheme://authority/id. The resolver types are:
| URI | What it is |
|---|---|
yelp://biz/<alias> |
one business, keyed by its alias |
yelp://user/<id> |
a reviewer's public profile |
ant get yelp://biz/garaje-san-francisco # the business record
ant cat yelp://biz/garaje-san-francisco # just the description body
ant get yelp://user/<id> # a reviewer's profile
ant url yelp://biz/garaje-san-francisco # the live https URL
ant resolve https://www.yelp.com/biz/garaje-san-francisco # a pasted link, back to its URI
biz is best-effort on the web plane (a datacenter may hit Yelp's bot wall and
report need-auth) and reliable on the fusion plane when YELP_API_KEY is set.
user is web plane only, since the Fusion API has no user endpoint. See
what anonymous access reaches.
Collections
ls lists the members of a collection. Each list operation has its own
authority, so they never shadow one another:
| URI | What it lists |
|---|---|
yelp://search/<term> |
businesses matching a term |
yelp://reviews/<alias> |
a business's reviews |
yelp://suggest/<prefix> |
autocomplete suggestions |
yelp://categories |
the Yelp category taxonomy |
ant ls yelp://search/tacos # businesses matching a term
ant ls yelp://reviews/garaje-san-francisco # the business's reviews
ant ls yelp://suggest/coffee # autocomplete a prefix
Scope a search to a place through the host's query options, the same as
--location on the command line.
Walking the graph
Every record carries explicit edges to the records it points at, so a host can
breadth-first crawl the site and write it to disk without scraping URLs out of
free text. A resolver edge names a bare field and points at one record; a
collection edge carries the parent id under a <name>_ref field and points at a
list authority. The edges are:
| From | Field | Edge to |
|---|---|---|
Suggestion |
search_ref |
yelp://search/<text> |
Suggestion |
business |
yelp://biz/<alias> |
Business |
reviews_ref |
yelp://reviews/<alias> |
Business |
category_ref |
yelp://search/<alias> |
Review |
business |
yelp://biz/<alias> |
Review |
author_id |
yelp://user/<id> |
Category |
search_ref |
yelp://search/<alias> |
The edges close into one connected graph. A suggestion fans out into a place
search and, for a business suggestion, straight to that business; a search card
walks through to its full business; a business reaches its reviews and a
same-category search; a review reaches back to its business and on to the
reviewer's own profile; a category fans into a search. No node is left without an
outward edge, so a crawl started anywhere reaches the rest of the reachable site.
Starting from any node, --follow walks these edges:
ant export yelp://search/tacos --follow 2 --to ./data # each business, then its reviews and a same-category search
ant get yelp://biz/garaje-san-francisco
ant cat yelp://biz/garaje-san-francisco # the description body
ant url yelp://biz/garaje-san-francisco
Each record is written under its minted URI with its edges intact, so the saved set reconstructs the slice of the site that was reached: the search results, the full business behind each card, its reviews, and the profile of each reviewer.
These edge fields stay out of the table and CSV views (they would be noise in a terminal) but are always present in the JSON and JSONL a host reads.
Why this is the same code
The driver and the binary share one definition per operation. A resolver op
answers both yelp biz on the command line and ant get yelp://biz/... through a
host, from the same handler and the same client. There is no second
implementation to keep in step.