hfHubDownload function
- required String repoId,
- required String filename,
- String? subfolder,
- String? repoType,
- String? revision,
- String? libraryName,
- String? libraryVersion,
- String? cacheDir,
- String? localDir,
- dynamic userAgent,
- bool forceDownload = false,
- Map<
String, String> ? proxies, - double etagTimeout = constants.DEFAULT_ETAG_TIMEOUT,
- dynamic token,
- bool localFilesOnly = false,
- Map<
String, String> ? headers, - String? endpoint,
Download a given file if it's not already present in the local cache.
The new cache file layout looks like this:
- The cache directory contains one subfolder per repo_id (namespaced by repo type)
- inside each repo folder:
- refs is a list of the latest known revision => commit_hash pairs
- blobs contains the actual file blobs (identified by their git-sha or sha256, depending on whether they're LFS files or not)
- snapshots contains one subfolder per commit, each "commit" contains the subset of the files that have been resolved at that particular commit. Each filename is a symlink to the blob at that particular commit.
[ 96] .
└── [ 160] models--julien-c--EsperBERTo-small
├── [ 160] blobs
│ ├── [321M] 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
│ ├── [ 398] 7cb18dc9bafbfcf74629a4b760af1b160957a83e
│ └── [1.4K] d7edf6bd2a681fb0175f7735299831ee1b22b812
├── [ 96] refs
│ └── [ 40] main
└── [ 128] snapshots
├── [ 128] 2439f60ef33a0d46d85da5001d52aeda5b00ce9f
│ ├── [ 52] README.md -> ../../blobs/d7edf6bd2a681fb0175f7735299831ee1b22b812
│ └── [ 76] pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
└── [ 128] bbc77c8132af1cc5cf678da3f1ddf2de43606d48
├── [ 52] README.md -> ../../blobs/7cb18dc9bafbfcf74629a4b760af1b160957a83e
└── [ 76] pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
If local_dir
is provided, the file structure from the repo will be replicated in this location. When using this
option, the cache_dir
will not be used and a .cache/huggingface/
folder will be created at the root of local_dir
to store some metadata related to the downloaded files. While this mechanism is not as robust as the main
cache-system, it's optimized for regularly pulling the latest version of a repository.
Args:
repo_id (str
):
A user or an organization name and a repo name separated by a /
.
filename (str
):
The name of the file in the repo.
subfolder (str
, optional):
An optional value corresponding to a folder inside the model repo.
repo_type (str
, optional):
Set to "dataset"
or "space"
if downloading from a dataset or space,
None
or "model"
if downloading from a model. Default is None
.
revision (str
, optional):
An optional Git revision id which can be a branch name, a tag, or a
commit hash.
library_name (str
, optional):
The name of the library to which the object corresponds.
library_version (str
, optional):
The version of the library.
cache_dir (str
, Path
, optional):
Path to the folder where cached files are stored.
local_dir (str
or Path
, optional):
If provided, the downloaded file will be placed under this directory.
user_agent (dict
, str
, optional):
The user-agent info in the form of a dictionary or a string.
force_download (bool
, optional, defaults to False
):
Whether the file should be downloaded even if it already exists in
the local cache.
proxies (dict
, optional):
Dictionary mapping protocol to the URL of the proxy passed to
requests.request
.
etag_timeout (float
, optional, defaults to 10
):
When fetching ETag, how many seconds to wait for the server to send
data before giving up which is passed to requests.request
.
token (str
, bool
, optional):
A token to be used for the download.
- If True
, the token is read from the HuggingFace config
folder.
- If a string, it's used as the authentication token.
local_files_only (bool
, optional, defaults to False
):
If True
, avoid downloading the file and return the path to the
local cached file if it exists.
headers (dict
, optional):
Additional headers to be sent with the request.
Returns:
str
: Local path of file or if networking is off, last version of file cached on disk.
Raises:
`~utils.RepositoryNotFoundError`
If the repository to download from cannot be found. This may be because it doesn't exist,
or because it is set to private
and you do not have access.
`~utils.RevisionNotFoundError`
If the revision to download from cannot be found.
`~utils.EntryNotFoundError`
If the file to download cannot be found.
`~utils.LocalEntryNotFoundError`
If network is disabled or unavailable and file is not found in cache.
EnvironmentError
If token=True
but the token cannot be found.
OSError
If ETag cannot be determined.
ValueError
If some parameter value is invalid.
Implementation
Future<String> hfHubDownload({
required String repoId,
required String filename,
String? subfolder,
String? repoType,
String? revision,
String? libraryName,
String? libraryVersion,
String? cacheDir,
String? localDir,
dynamic userAgent,
bool forceDownload = false,
Map<String, String>? proxies,
double etagTimeout = constants.DEFAULT_ETAG_TIMEOUT,
dynamic token,
bool localFilesOnly = false,
Map<String, String>? headers,
String? endpoint,
}) async {
if (constants.HF_HUB_ETAG_TIMEOUT != constants.DEFAULT_ETAG_TIMEOUT) {
// Respect environment variable above user value
etagTimeout = constants.HF_HUB_ETAG_TIMEOUT.toDouble();
}
cacheDir ??= constants.HF_HUB_CACHE;
revision ??= constants.DEFAULT_REVISION;
subfolder = subfolder?.isEmpty == true ? null : subfolder;
if (subfolder != null) {
// This is used to create a URL, and not a local path, hence the forward slash.
filename = '$subfolder/$filename';
}
repoType ??= 'model';
if (!constants.REPO_TYPES.contains(repoType)) {
throw ArgumentError('Invalid repo type: $repoType. Accepted repo types are: ${constants.REPO_TYPES}');
}
final Map<String, String> hfHeaders = await buildHfHeaders(
token: token,
libraryName: libraryName,
libraryVersion: libraryVersion,
userAgent: userAgent,
headers: headers,
);
if (localDir != null) {
return _hfHubDownloadToLocalDir(
// Destination
localDir: localDir,
// File info
repoId: repoId,
repoType: repoType,
filename: filename,
revision: revision,
// HTTP info
endpoint: endpoint,
etagTimeout: etagTimeout,
headers: hfHeaders,
proxies: proxies,
token: token,
// Additional options
cacheDir: cacheDir,
forceDownload: forceDownload,
localFilesOnly: localFilesOnly,
);
}
return await _hfHubDownloadToCacheDir(
// Destination
cacheDir: cacheDir,
// File info
repoId: repoId,
filename: filename,
repoType: repoType,
revision: revision,
// HTTP info
endpoint: endpoint,
etagTimeout: etagTimeout,
headers: hfHeaders,
proxies: proxies,
token: token,
// Additional options
localFilesOnly: localFilesOnly,
forceDownload: forceDownload,
);
}