ArcaPix GPFS Python API

ArcaPix’s GPFS Python API supports IBM’s GPFS (Spectrum Scale) platform. ArcaPix’s GPFS Python API is licensed under the ArcaPix License API 1.0

Please consult the CHANGELOG.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

An Introduction to the ArcaPix Python GPFS API

Spectrum Scale (also known as GPFS) is a very scalable, high performance, clustered parallel file system. ArcaServe/PixStor builds on GPFS and provides a management GUI, however the core (maximum) control of the system is via the command line. The ArcaPix GPFS Python API aims to provide a more consistent and easier to use approach. The API enables the full power of the Python Language to be leveraged both interactively and when building scripts to automate common batch and management processes.

If you are already familiar with GPFS concepts, including Filesets, Snapshots, Policies, you may want to dive straight to the reference documentation including descriptions of the enhanced callback and policy processing functionality.

Obtaining the API

The GPFS Python API is only available to ArcaStream and Pixit Media (‘ArcaPix’) customers and is pre-installed on all new systems. Existing customers can install the API via applying updates or by contacting the support team.

Setup

The API is compatible with Python >= 2.7.6 and IBM GPFS (Spectrum Scale) 3.5.x, 4.0.x, 4.1.x and 4.2.x

Important

It is important on CentOS 6 systems to only use the supplied Python 2.7, not the system python, which is only version 2.6.

The API is provided as a RPM package to be installed on all server nodes in the cluster. Recent ArcaServe/PixStor installations are provided with the API pre-installed.

Getting started

To use the API interactively, simply start python, then import the Cluster module:

$ python2.7
Python 2.7.9 (default, May 15 2015, 14:50:45)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from arcapix.fs.gpfs import Cluster

The ‘object hierarchy’ can then be explored, accessed via collections of objects.

E.G. a cluster has a collection of filesystems and a filesystem might have zero or more snapshots:

>>> print Cluster().filesystems
mmfs1,mmfs2

>>> print Cluster().filesystems['mmfs1'].snapshots
global-snapshot1,TemporarySnapshot

To understand the available objects and how they are linked together, please refer to the Object Model Hierarchy diagram.

Properties

The API objects have a various properties which you can interrogate - broadly matching those reported by the GPFS commands. Many of the objects provide a change() method to modify the object’s properties.

E.G. to change the default mount point of a filesystem:

>>> Cluster().filesystems['mmfs1'].change(mountPoint='/gpfs1')

To explore which properties are available and which can be changed, refer to the documentation for the core objects – Cluster, Filesystem, Fileset, Snapshot, Quota, Node.

Certain objects also support an extended range of operations.

E.G. to shutdown GPFS on a node:

>>> Cluster().nodes['node1'].shutdown()

Creating and Deleting

Selected objects can be created or deleted from a parent collection.

E.G. to create a new Snapshot of the filesystem (e.g. to use with a backup system):

fsys = Cluster().filesystems['mmfs1']
snap = fsys.snapshots.new('TemporarySnapshot')

# perform backup ...

fsys.snapshots.destroy(snap)

Event Handlers (aka Callbacks)

Certain objects provide the ability to add event handlers (aka callbacks) using the typical Python idiom:

1
2
3
4
5
6
def handler():
    with open("/tmp/logfile","a") as f:
           f.write("Filesystem mounted")


Cluster().filesystems['mmfs1'].onMount.new(handler)

Note

More than one handler can be specified for a given event and they will all be called. Event handlers persist between Python invocations until explicitly removed.

Detailed callbacks information is described in the comprehensive Getting Started with Callbacks. For the convenience of those who are not familiar with passing functions as values as above a short primer is provided in the Almanac.

Walking the Filesystem

Typically, walking a filesystem is a very slow operation. Spectrum Scale (GPFS) provides a much faster mechanism scanning the filesystem metadata directly, as well as utilising the parallelism offered by a multi-server cluster. Metadata scanning can be achieved by defining a ManagementPolicy which has multiple Rules. Policy Rules can migrate and delete files, define which performance tier data is written to and (in conjunction with the API) provide a more general purpose capability in a similar way to the event handlers (callbacks) above:

1
2
3
4
5
6
7
8
9
def listFilesPerUser(files):
   result = {}
      for file in files:
        result[file.userid] = result.pop(file.userid,0)+1
   return result

p = ManagementPolicy()
p.rules.new(ListProcessingRule, 'user_counts', listFilesPerUser)
print p.run('mmfs1')

A detailed ListProcessingRule and the even more parallel MapReduceRule capabilities are described in the Getting Started With List Processing guide.

Dry Run

Dry run functionality is available to see what GPFS commands will be run by a given script or piece of code without the code executing on the Filesystem

>>> from arcapix.fs.gpfs.execute import setDryRun, getCmdCache
>>> from arcapix.fs.gpfs import Filesystem

>>> setDryRun(True)

>>> fs = Filesystem('mmfs1')

>>> for s in fs.snapshots.values():
...     s.delete()

>>> print getCmdCache()
['mmdelsnapshot mmfs1 global-snapshot1',
'mmdelsnapshot mmfs1 snap_test']

Read-only commands will be run as normal, but any commands that would change the underlying Filesystem are caught during dry run mode.

Caveats

Whilst the API strives to make managing a GPFS system as simple and ‘Pythonic’ as possible there are a number of caveats which mean that very complex usages may behave strangely – these are described in the ‘Warning’ sections of the relative documents. In general however most of the common administrative tasks you want to do will work straight away.

Bugs, feature requests and patches

Please submit all such items to support@[arcastream|pixitmedia].com.

Licensing

ArcaPix’s GPFS Python API is proprietary commercial software.

You may not distribute the API to any third party.

For further information, please consult the LICENSE file included.

Queries regarding licensing should be forwarded to support@[arcastream|pixitmedia].com.

Indices and tables

Module Index Index Search Page
quick access to all modules all functions, classes, terms search this documentation