hfHubDownload function

Future<String> hfHubDownload({
  1. required String repoId,
  2. required String filename,
  3. String? subfolder,
  4. String? repoType,
  5. String? revision,
  6. String? libraryName,
  7. String? libraryVersion,
  8. String? cacheDir,
  9. String? localDir,
  10. dynamic userAgent,
  11. bool forceDownload = false,
  12. Map<String, String>? proxies,
  13. double etagTimeout = constants.DEFAULT_ETAG_TIMEOUT,
  14. dynamic token,
  15. bool localFilesOnly = false,
  16. Map<String, String>? headers,
  17. String? endpoint,
})

Download a given file if it's not already present in the local cache.

The new cache file layout looks like this:

  • The cache directory contains one subfolder per repo_id (namespaced by repo type)
  • inside each repo folder:
    • refs is a list of the latest known revision => commit_hash pairs
    • blobs contains the actual file blobs (identified by their git-sha or sha256, depending on whether they're LFS files or not)
    • snapshots contains one subfolder per commit, each "commit" contains the subset of the files that have been resolved at that particular commit. Each filename is a symlink to the blob at that particular commit.
[  96]  .
└── [ 160]  models--julien-c--EsperBERTo-small
    ├── [ 160]  blobs
    │   ├── [321M]  403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
    │   ├── [ 398]  7cb18dc9bafbfcf74629a4b760af1b160957a83e
    │   └── [1.4K]  d7edf6bd2a681fb0175f7735299831ee1b22b812
    ├── [  96]  refs
    │   └── [  40]  main
    └── [ 128]  snapshots
        ├── [ 128]  2439f60ef33a0d46d85da5001d52aeda5b00ce9f
        │   ├── [  52]  README.md -> ../../blobs/d7edf6bd2a681fb0175f7735299831ee1b22b812
        │   └── [  76]  pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
        └── [ 128]  bbc77c8132af1cc5cf678da3f1ddf2de43606d48
            ├── [  52]  README.md -> ../../blobs/7cb18dc9bafbfcf74629a4b760af1b160957a83e
            └── [  76]  pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd

If local_dir is provided, the file structure from the repo will be replicated in this location. When using this option, the cache_dir will not be used and a .cache/huggingface/ folder will be created at the root of local_dir to store some metadata related to the downloaded files. While this mechanism is not as robust as the main cache-system, it's optimized for regularly pulling the latest version of a repository.

Args: repo_id (str): A user or an organization name and a repo name separated by a /. filename (str): The name of the file in the repo. subfolder (str, optional): An optional value corresponding to a folder inside the model repo. repo_type (str, optional): Set to "dataset" or "space" if downloading from a dataset or space, None or "model" if downloading from a model. Default is None. revision (str, optional): An optional Git revision id which can be a branch name, a tag, or a commit hash. library_name (str, optional): The name of the library to which the object corresponds. library_version (str, optional): The version of the library. cache_dir (str, Path, optional): Path to the folder where cached files are stored. local_dir (str or Path, optional): If provided, the downloaded file will be placed under this directory. user_agent (dict, str, optional): The user-agent info in the form of a dictionary or a string. force_download (bool, optional, defaults to False): Whether the file should be downloaded even if it already exists in the local cache. proxies (dict, optional): Dictionary mapping protocol to the URL of the proxy passed to requests.request. etag_timeout (float, optional, defaults to 10): When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to requests.request. token (str, bool, optional): A token to be used for the download. - If True, the token is read from the HuggingFace config folder. - If a string, it's used as the authentication token. local_files_only (bool, optional, defaults to False): If True, avoid downloading the file and return the path to the local cached file if it exists. headers (dict, optional): Additional headers to be sent with the request.

Returns: str: Local path of file or if networking is off, last version of file cached on disk.

Raises: `~utils.RepositoryNotFoundError` If the repository to download from cannot be found. This may be because it doesn't exist, or because it is set to private and you do not have access. `~utils.RevisionNotFoundError` If the revision to download from cannot be found. `~utils.EntryNotFoundError` If the file to download cannot be found. `~utils.LocalEntryNotFoundError` If network is disabled or unavailable and file is not found in cache. EnvironmentError If token=True but the token cannot be found. OSError If ETag cannot be determined. ValueError If some parameter value is invalid.

Implementation

Future<String> hfHubDownload({
  required String repoId,
  required String filename,
  String? subfolder,
  String? repoType,
  String? revision,
  String? libraryName,
  String? libraryVersion,
  String? cacheDir,
  String? localDir,
  dynamic userAgent,
  bool forceDownload = false,
  Map<String, String>? proxies,
  double etagTimeout = constants.DEFAULT_ETAG_TIMEOUT,
  dynamic token,
  bool localFilesOnly = false,
  Map<String, String>? headers,
  String? endpoint,
}) async {
  if (constants.HF_HUB_ETAG_TIMEOUT != constants.DEFAULT_ETAG_TIMEOUT) {
    // Respect environment variable above user value
    etagTimeout = constants.HF_HUB_ETAG_TIMEOUT.toDouble();
  }

  cacheDir ??= constants.HF_HUB_CACHE;
  revision ??= constants.DEFAULT_REVISION;

  subfolder = subfolder?.isEmpty == true ? null : subfolder;
  if (subfolder != null) {
    // This is used to create a URL, and not a local path, hence the forward slash.
    filename = '$subfolder/$filename';
  }

  repoType ??= 'model';
  if (!constants.REPO_TYPES.contains(repoType)) {
    throw ArgumentError('Invalid repo type: $repoType. Accepted repo types are: ${constants.REPO_TYPES}');
  }

  final Map<String, String> hfHeaders = await buildHfHeaders(
    token: token,
    libraryName: libraryName,
    libraryVersion: libraryVersion,
    userAgent: userAgent,
    headers: headers,
  );

  if (localDir != null) {
    return _hfHubDownloadToLocalDir(
      // Destination
      localDir: localDir,
      // File info
      repoId: repoId,
      repoType: repoType,
      filename: filename,
      revision: revision,
      // HTTP info
      endpoint: endpoint,
      etagTimeout: etagTimeout,
      headers: hfHeaders,
      proxies: proxies,
      token: token,
      // Additional options
      cacheDir: cacheDir,
      forceDownload: forceDownload,
      localFilesOnly: localFilesOnly,
    );
  }

  return await _hfHubDownloadToCacheDir(
    // Destination
    cacheDir: cacheDir,
    // File info
    repoId: repoId,
    filename: filename,
    repoType: repoType,
    revision: revision,
    // HTTP info
    endpoint: endpoint,
    etagTimeout: etagTimeout,
    headers: hfHeaders,
    proxies: proxies,
    token: token,
    // Additional options
    localFilesOnly: localFilesOnly,
    forceDownload: forceDownload,
  );
}