twisted.python.url.URL(object)
class documentationtwisted.python.url
View Source
(View In Hierarchy)
A URL
represents
a URL and provides a convenient API for modifying its parts.
A URL is split into a number of distinct parts: scheme, host, port, path segments, query parameters and fragment identifier:
http://example.com:8080/a/b/c?d=e#f ^ scheme ^ port ^ query parameters ^ host ^ path segments ^ fragment
You can construct URL
objects by passing
in these components directly, like so:
>>> from twisted.python.url import URL >>> URL(scheme=u'https', host=u'example.com', ... path=[u'hello', u'world']) URL.fromText(u'https://example.com/hello/world')
Or you can use the fromText
method you can see in the output there:
>>> URL.fromText(u'https://example.com/hello/world') URL.fromText(u'https://example.com/hello/world')
There are two major advantages of using URL
over representing
URLs as strings. The first is that it's really easy to evaluate a relative
hyperlink, for example, when crawling documents, to figure out what is
linked:
>>> URL.fromText(u'https://example.com/base/uri/').click(u"/absolute") URL.fromText(u'https://example.com/absolute') >>> (URL.fromText(u'https://example.com/base/uri/') ... .click(u"relative/path")) URL.fromText(u'https://example.com/base/uri/relative/path')
The other is that URLs have two normalizations. One representation is
suitable for humans to read, because it can represent data from many
character sets - this is the Internationalized, or IRI, normalization. The
other is the older, US-ASCII-only representation, which is necessary for
most contexts where you would need to put a URI. You can convert *between*
these representations according to certain rules. URL
exposes these
conversions as methods:
>>> URL.fromText(u"https://→example.com/foo⇧bar/").asURI() URL.fromText(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/') >>> (URL.fromText(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/') .asIRI()) URL.fromText(u'https://\u2192example.com/foo\u21e7bar/')
See Also | RFC 3986, Uniform Resource Identifier (URI): Generic Syntax | |
RFC 3987, Internationalized Resource Identifiers |
Instance Variable | scheme | The URI scheme. (type: unicode ) |
Instance Variable | user 0 | The username portion of the URL, if specified; otherwise the empty string. (type: unicode ) |
Instance Variable | userinfo | The username and password portions of the URL, if specified, separated with
colons. If not specified, the empty string. (type: unicode ) |
Instance Variable | host | The host name. (type: unicode ) |
Instance Variable | port | The port number. (type: int ) |
Instance Variable | path | The path segments. (type: tuple
of unicode .) |
Instance Variable | query | The query parameters, as (name, value) pairs. (type: tuple
of 2-tuple s
of (name: unicode ,
value: (unicode
for values or None for stand-alone query parameters with no
= in them)).) |
Instance Variable | fragment | The fragment identifier. (type: unicode ) |
Instance Variable | rooted | Does the path start with a / ? This is taken from the
terminology in the BNF grammar, specifically the
path-rootless , rule, since "absolute path" and
"absolute URI" are somewhat ambiguous. path does not
contain the implicit prefixed "/" since that is
somewhat awkward to work with. (type: bool ) |
Method | __init__ | Create a new URL
from structured information about itself. |
Method | user | The user portion of userinfo ; everything up to the first
":" . |
Method | authority | Compute and return the appropriate host/port/userinfo combination. |
Method | __eq__ | URL s are equal to
URL objects whose
attributes are equal. |
Method | __ne__ | URL s are unequal
to URL objects whose
attributes are unequal. |
Method | absolute | Is this URL complete enough to resolve a resource without resolution relative to a base-URI? |
Method | replace | Make a new instance of self.__class__ , passing along the
given arguments to its constructor. |
Class Method | fromText | Parse the given string into a URL object. |
Method | child | Construct a URL
where the given path segments are a child of this url, presering the query
and fragment. |
Method | sibling | Construct a url where the given path segment is a sibling of this url. |
Method | click | Resolve the given URI reference relative to this (base) URI. |
Method | asURI | No summary |
Method | asIRI | Convert a URL
object that potentially contains text that has been percent-encoded or IDNA
encoded into a URL
object containing the text as it should be presented to a human for
reading. |
Method | asText | Convert this URL to its canonical textual representation. |
Method | __repr__ | Convert this URL to an eval -able representation that shows
all of its constituent parts. |
Method | add | Create a new |
Method | set | Create a new |
Method | get | Retrieve a list of values for the given named query parameter. |
Method | remove | Create a new URL
with all query arguments with the given name removed. |
unicode
)
/
? This is taken from the
terminology in the BNF grammar, specifically the
path-rootless
, rule, since "absolute path" and
"absolute URI" are somewhat ambiguous. path
does not
contain the implicit prefixed "/"
since that is
somewhat awkward to work with. (type: bool
)
Create a new URL
from structured information about itself.
Is this URL complete enough to resolve a resource without resolution relative to a base-URI?
Make a new instance of self.__class__
, passing along the
given arguments to its constructor.
Parameters | scheme | the scheme of the new URL; if unspecified, the scheme of this URL. (type: unicode ) |
host | the host of the new URL; if unspecified, the host of this URL. (type: unicode ) | |
path | the path segments of the new URL; if unspecified, the path segments of this
URL. (type: iterable of unicode ) | |
query | the query elements of the new URL; if unspecified, the query segments of
this URL. (type: iterable of 2-tuple s
of key, value.) | |
fragment | the fragment of the new URL; if unspecified, the query segments of this
URL. (type: unicode ) | |
port | the port of the new URL; if unspecified, the port of this URL. (type: int ) | |
rooted | True if the given path are meant to start at the
root of the host; False otherwise. Only meaningful for
relative URIs. (type: bool ) | |
userinfo | A string indicating information about an authenticated user. (type: unicode ) | |
Returns | a new URL . |
Construct a URL
where the given path segments are a child of this url, presering the query
and fragment.
For example:
>>> (URL.fromText(u"http://localhost/a/b?x=y") .child(u"c", u"d").asText()) u'http://localhost/a/b/c?x=y'
Parameters | segments | A path segment. (type: tuple
of unicode ) |
Returns | a new URL with the
additional path segments. (type: URL ) |
Resolve the given URI reference relative to this (base) URI.
The resulting URI should match what a web browser would generate if you
click on href
in the context of this URI.
Parameters | href | a URI reference (type: unicode
or ASCII str ) |
Returns | a new absolute URL | |
See Also | RFC 3986 section 5, Reference Resolution |
Convert a URL
object that potentially contains non-ASCII characters into a URL
object where all
non-ASCII text has been encoded appropriately. This is useful to do in
preparation for sending a URL
, or portions of it,
over a wire protocol. For example:
>>> URL.fromText(u"https://→example.com/foo⇧bar/").asURI() URL.fromText(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/')
Returns | a new URL with its
path-segments, query-parameters, and hostname appropriately decoded, so
that they are all in the US-ASCII range. (type: URL ) |
Convert a URL
object that potentially contains text that has been percent-encoded or IDNA
encoded into a URL
object containing the text as it should be presented to a human for
reading.
For example:
>>> (URL.fromText(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/') .asIRI()) URL.fromText(u'https://\u2192example.com/foo\u21e7bar/')
Returns | a new URL with its
path-segments, query-parameters, and hostname appropriately decoded. (type: URL ) |
Convert this URL to its canonical textual representation.
Parameters | includeSecrets | Should the returned textual representation include potentially sensitive
information? The default, False , if not; True if
so. Quoting from RFC3986, section 3.2.1:
"Applications should not render as clear text any data after the first colon (":") character found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password)." (type:bool ) |
Returns | The serialized textual representation of this URL , such as
u"http://example.com/some/path?some=query" . (type: unicode ) |
Convert this URL to an eval
-able representation that shows
all of its constituent parts.
Create a new URL
with a given query argument, name
, added to it with the value
value
, like so:
>>> URL.fromText(u'https://example.com/?x=y').add(u'x') URL.fromText(u'https://example.com/?x=y&x') >>> URL.fromText(u'https://example.com/?x=y').add(u'x', u'z') URL.fromText(u'https://example.com/?x=y&x=z')
Parameters | name | The name (the part before the = ) of the query parameter to
add. (type: unicode ) |
value | The value (the part after the = ) of the query parameter to
add. (type: unicode ) | |
Returns | a new URL with the
parameter added. |
Create a new URL
with all existing occurrences of the query argument name
, if
any, removed, then add the argument with the given value, like so:
>>> URL.fromText(u'https://example.com/?x=y').set(u'x') URL.fromText(u'https://example.com/?x') >>> URL.fromText(u'https://example.com/?x=y').set(u'x', u'z') URL.fromText(u'https://example.com/?x=z')
Parameters | name | The name (the part before the = ) of the query parameter to
add. (type: unicode ) |
value | The value (the part after the = ) of the query parameter to
add. (type: unicode ) | |
Returns | a new URL with the
parameter added or changed. |
Retrieve a list of values for the given named query parameter.
Parameters | name | The name of the query parameter to retrieve. (type: unicode ) |
Returns | all the values associated with the key; for example, for the query string
u"x=1&x=2" ,
url.query.get(u"x") would return [u'1',
u'2'] ; url.query.get(u"y") (since there is no
"y" parameter) would return the empty list,
[] . (type: list
of unicode ) |