Staging Modules

These modules are located in GRID_LRT.Staging and can be used to batch stage or check the status of the files on the GRID Storage.

GRID_LRT.Staging.srmlist

GRID_LRT.Staging.srmlist.count_files_uberftp(directory)[source]
GRID_LRT.Staging.srmlist.make_srmlist_from_gsiftpdir(gsiftpdir)[source]
GRID_LRT.Staging.srmlist.slice_dicts(srmdict, slice_size=10)[source]

Returns a dict of lists that hold 10 SBNs (by default). Missing Subbands are treated as empty spaces, if you miss SB009, the list will include 9 items from SB000 to SB008, and next will start at SB010

class GRID_LRT.Staging.srmlist.srmlist(check_OBSID=True, check_location=True, link=None)[source]

Bases: list

The srmlist class is an extension of Python lists that can hold a list of srm links to data on GRID Storage (LOFAR Archive, Intermediate Storage, etc).

In addition to the regular list capabilities, it also has internal checks for the location and the OBSID of the data. When a new item is appended, these checks are done automatically. Checking OBSID is an optional argument set to True by default.

__init__(check_OBSID=True, check_location=True, link=None)[source]

__init__: Initializes the srmlist object.

Parameters:
  • check_OBSID (Boolean) – Boolean flag to check if each added link has the same OBSID
  • check_location (Boolean) – Boolean flag to check if all files are in the same location (for staging purposes)
  • link (str) – append a link to the srmlist at creation
append(item)[source]

L.append(object) – append object to end

check_location(item)[source]
check_str_location(item)[source]
count(value) → integer -- return number of occurrences of value
extend()

L.extend(iterable) – extend list by appending elements from the iterable

Returns a generator that can be used to generate links that can be staged/stated with gfal

gfal_replace(item)[source]

For each item, it creates a valid link for the gfal staging scripts

Returns a generator which can be iterated over, this generator will return a set of gsiftp:// links which can be used with globus-url-copy and uberftp

gsi_replace(item)[source]

Returns a generator that can be used to generate http:// links that can be downloaded using wget

http_replace(item)[source]
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

insert()

L.insert(index, object) – insert object before index

pop([index]) → item -- remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove()

L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.

reverse()

L.reverse() – reverse IN PLACE

sbn_dict(pref='SB', suff='_')[source]

Returns a generator that creates a pair of SBN and link. Can be used to create dictionaries

sort()

L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1

srm_replace(item)[source]
stringify_item(item)[source]
trim_spaces(item)[source]

Sometimes there are two fields in the incoming list. Only take the first as long as it’s fromatted properly

GRID_LRT.Staging.stage_all_LTA

GRID_LRT.Staging.stage_all_LTA.get_stage_status(stageid)[source]
GRID_LRT.Staging.stage_all_LTA.location(filename)[source]
GRID_LRT.Staging.stage_all_LTA.main(filename, test=False)[source]
GRID_LRT.Staging.stage_all_LTA.process(urls, repl_string, match, test=False)[source]
GRID_LRT.Staging.stage_all_LTA.process_surl_line(line)[source]

Used to drop empty lines and to take the first argument of the srmfile (the srm:// link)

GRID_LRT.Staging.stage_all_LTA.replace(file_loc)[source]
GRID_LRT.Staging.stage_all_LTA.return_srmlist(filename)[source]
GRID_LRT.Staging.stage_all_LTA.state_dict(srm_dict)[source]
GRID_LRT.Staging.stage_all_LTA.strip(item)[source]

GRID_LRT.Staging.state_all

Python module to check the state of files using gfal and return their locality # ===================================================================== # # author: Ron Trompert <ron.trompert@surfsara.nl> – SURFsara # # helpdesk: Grid Services <grid.support@surfsara.nl> – SURFsara # # # # usage: python state.py # # description: # # Display the status of each file listed in “files”. The paths # # should have the ‘/pnfs/…’ format. Script output: # # ONLINE: means that the file is only on disk # # NEARLINE: means that the file in only on tape # # ONLINE_AND_NEARLINE: means that the file is on disk # # and tape # # ===================================================================== #

GRID_LRT.Staging.state_all.check_status(surl_link, verbose=True)[source]

Obtain the status of a file from the given surl.

Args:
param surl:the SURL pointing to the file.
type surl:str
parame verbose:print the status to the terminal.
type verbose:bool
Returns:
(filename, status):
 a tuple containing the file and status as stored in the ‘user.status’ attribute.
GRID_LRT.Staging.state_all.check_status_file(surl_list)[source]

Unimplemented task

GRID_LRT.Staging.state_all.load_file_into_srmlist(filename)[source]

Helper function that loads a file into an srmlist object (will be added to the actual srmlist class later)

GRID_LRT.Staging.state_all.main(filename, verbose=True)[source]

Main function that takes in a file name and returns a list of tuples of filenames and staging statuses. The input file can be both srm:// and gsiftp:// links.

Args:
param filename:The filename holding the links whose have to be checked
type filename:str
param verbose:A toggle to turn off printing out the status of each file.

True by default will print everything out :type verbose: bool

Returns:
ret results:A list of tuples containing the file_name and the State

Usage:

>>> from GRID_LRT.Staging import state_all
>>> filename='/home/apmechev/GRIDTOOLS/GRID_LRT/GRID_LRT/tests/srm_50_sara.txt'
>>> results=state_all.main(filename)
>>> results=state_all.main(filename, verbose=False)
>>> results[0]
('L229507_SB150_uv.dppp.MS_f6fc7fc5.tar', 'ONLINE_AND_NEARLINE')
GRID_LRT.Staging.state_all.percent_staged(results)[source]

Takes list of tuples of (srm, status) and counts the percentage of files that are staged (0->1) and retunrs this percentage as float

Usage:

>>> from GRID_LRT.Staging import state_all
>>> filename='/home/apmechev/GRIDTOOLS/GRID_LRT/GRID_LRT/tests/srm_50_sara.txt'
>>> results=state_all.main(filename, verbose=False)
>>> state_all.percent_staged(results)

GRID_LRT.Staging.stager_access

It uses an xmlrpc proxy to talk and authenticate to the remote service. Your account credentials will be read from the awlofar catalog Environment.cfg, if present or can be provided in a .stagingrc file in your home directory.

!!Please do not talk directly to the xmlrpc interface, but use this module to access the provided functionality. !! This is to ensure that when we change the remote interface, your scripts don’t break and you will only have to upgrade this module.

GRID_LRT.Staging.stager_access.stage(surls)[source]

Stage list of SURLs or a string holding a single SURL

Parameters:surls (either a list() or a str()) – Either a list of strings or a string holding a single surl to stage
Returns:An integer which is used to refer to the stagig request when polling

the API for a staging status

GRID_LRT.Staging.stager_access.get_status(stageid)[source]

Get status of request with given ID

Args:
param stageid:The id of the staging request which you want the status of
type stageid:int
Returns:
status:A string describing the staging status: ‘new’, ‘scheduled’,

‘in progress’ or ‘success’

GRID_LRT.Staging.stager_access.get_surls_online(stageid)[source]

Get a list of all files that are already online for a running request with given ID

GRID_LRT.Staging.stager_access.get_srm_token(stageid)[source]

Get the SRM request token for direct interaction with the SRM site via Grid/SRM tools

GRID_LRT.Staging.stager_access.reschedule(stageid)[source]

Reschedule a request with a given ID, e.g. after it was put on hold due to maintenance

GRID_LRT.Staging.stager_access.get_progress()[source]

Get a detailed list of all running requests and their current progress. As a normal user, this only returns your own requests.

GRID_LRT.Staging.stager_access.get_storage_info()[source]

Get storage information of the different LTA sites, e.g. to check available disk pool space. Requires support role permissions.