Posted inInformation Technology

C++ std::string_view Example

Coin-operated optical binoculars

If you work in IT long enough you will learn that academics have overrun standards committees. std::string_view is an example of why an elephant is a mouse designed by academic committee. For certain there were good intentions behind this. Too many of you kids unwilling to go to college and take all of those hard classes to learn how to do things right were living via the “just throw code at it” mantra.

To get around lack of knowledge many were skimming those Effective C++ books and doing bad things. You were all searching for a one-and-done solution so you were declaring every pointer const and what it pointed to const because you didn’t really understand what side const should be on. Then you started using references everywhere creating even more unreadable code because nobody could tell at a glace if the entity was being pass by value or reference.

Worse yet, none of your code was re-usable. Cut & paste-able, yes, re-usable, no.

string_view

The idea behind string_view was to make those who never went to college less of a random firing loose cannon. Y’all fell into three camps:

  • Those who passed everything const so nothing further downstream could get a piece of or modify the entity. This made all data un-usable.
  • Those who passed everything via reference because they had no idea how to use pointers properly. This left all data un-protected.
  • Those who really couldn’t figure out pointers or references so you passed everything by value meaning nothing passed in could ever be modified.

I have walked into so many shops where the junior developers had one hammer. Everything they worked on had to be a nail no bigger than that hammer could hit.

Theoretically, you are now supposed to pass string_view to everything that will not modify the original string. This means you pass by reference and pointer only to functions/methods which will modify. Adopting this coding methodology was supposed to eliminate a great many landmines. Boy did it add some big ones back in!

The Code

I created this example using Eclipse on Linux Mint.

cmake_minimum_required(VERSION 3.10)

# Set some basic project attributes
project (StringViewExample
	VERSION 0.1
	DESCRIPTION "A Hello World Project")

set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_STANDARD 17)  

# This project will output an executable file
add_executable(${PROJECT_NAME} StringViewExample.cpp)

# Create a simple configuration header
configure_file(config.h.in config.h)

# Include the configuration header in the build
target_include_directories(${PROJECT_NAME} PUBLIC "${PROJECT_BINARY_DIR}")

Lines 8 and 9 are the only thing I added to the generated “Hello World” project CMakeLists.txt file. Yes, I should have changed the description.

#include <iostream>
#include <string_view>
#include <algorithm>
#include <stdexcept>
#include <cstring>

#include "config.h"

using std::string;
using std::string_view;

void fun1( string_view view1);
void fun2( string_view view1);

int main(int argc, char **argv) {
	std::cout << "StringViewExample" << std::endl;
	std::cout << "Version " << StringViewExample_VERSION_MAJOR << "." << StringViewExample_VERSION_MINOR << std::endl;

	string str1 {"    I once saw a tiny tot have a tiny thought in a tiny room   "};
	fun1( str1);
	std::cout << "str1 after fun1() call |" << str1 << "|" << std::endl;

	fun2( str1);

	return (0);
}

/*
 * It is cheaper and easier to pass a string_view than a string or its pointer
 * or its reference when you have no need to change the string. This is far
 * less confusing than having to figure out a const pointer to a const string.
 * While our string is tiny, this could be a text file or gigantic XML message
 * from an Internet connection. You don't want to copy all of those characters
 * if you don't have to.
 */
void fun1( string_view view1)
{
	// we are only altering the view
	std::cout << "view1 passed to fun1() |" << view1 << "|" << std::endl;
	view1.remove_prefix(std::min(view1.find_first_not_of(" "), view1.size()));
	std::cout << "view1 after prefix removal |" << view1 << "|" << std::endl;
	view1.remove_suffix(view1.size()- view1.rfind("   "));
	std::cout << "view1 after suffix removal |" << view1 << "|" << std::endl;

	const string STR1 {"Just a constant string"};
	string_view view2 = STR1;

	view1.swap( view2);
	std::cout << "view1 after swap |" << view1 << "|" << std::endl;
	std::cout << "view2 after swap |" << view2 << "|" << std::endl;

	std::cout << "view2 last character is: " << view2.back() << std::endl;
	std::cout << "view2 first character is: " << view2.front() << std::endl;

	std::cout << "\n\nview1 max size: " << view1.max_size() << std::endl;
	std::cout << "view2 max size: " << view2.max_size() << std::endl << std::endl;

}

/*
 * The copy experiments.
 * These are dangerous, but not as dangerous as the standard C copy functions.
 */
void fun2( string_view view1)
{
	char c_buffer[2048] {};
	std::array<char, 800> longEnoughArray;
	std::array<char, 8>  tooShortArray;
	char too_short_c_buffer[10] {};

	std::cout << "view1 passed to fun2() |" << view1 << "|" << std::endl;


	longEnoughArray.fill('\0');
	view1.copy(longEnoughArray.data(), view1.size());
	/* only need to convert to string for output
	 * This will stuff all of the nulls in so you won't see the rest of the
	 * stream when running in Eclipse or most other tools.
	 * Iterators are evil. They have no concpet of context.
	 */
	std::string str(longEnoughArray.begin(), longEnoughArray.end());
	std::cout << "the whole thing size: " << str.size() << " content |" << str;
	std::cout << "|" << std::endl;

	// accessing raw buffers of containers is really not cool
	std::string str2(longEnoughArray.data());
	std::cout << "str 2 size: " << str2.size() << "  content |" << str2
			<< "|" << std::endl;
	view1.copy(c_buffer, view1.size());
	std::cout << "c_buffer |" << c_buffer << "|" << std::endl;

	/*
	 * A much safer method of obtaining characters from the view
	 * is with the substr() method.
	 *
	 * Default parameters for substr() are 0 and the npos max value
	 * Sadly, to provide and length you must provide a starting position.
	 *
	 */
	std::cout << "\n\n***Substring stuff\n";
	std::cout << view1.substr( 0, view1.find("have")) << std::endl;
	const char *endText = "thought";
	size_t start_pos = view1.find("have");
	size_t sub_len   = view1.rfind(endText) - start_pos + 1 + strlen(endText);

	std::cout << "start_pos: " << start_pos << "  sub_len: " << sub_len
			<< "  content |" << view1.substr( start_pos, sub_len ) << "|"
			<< std::endl;

	std::cout << "\n\n****Now for the danger\n" << std::endl;

	try
	{
		// classic buffer overwrite
		view1.copy( tooShortArray.data(), view1.size());
		std::cout << "tooShortArray |" << std::string(tooShortArray.data()) << "|\n" << std::endl;
	}
	catch (const std::exception& e)
	{
		std::cout << "copy to tooShortArray" << std::endl;
		std::cout << e.what() << std::endl;
	}

	try
	{
		// classic buffer overwrite
		view1.copy(too_short_c_buffer, view1.size());
		std::cout << "too_short_c_buffer |" << std::string( too_short_c_buffer) << "|\n" << std::endl;
	}
	catch (const std::exception& e)
	{
		std::cout << "copy to too_short_c_buffer" << std::endl;
		std::cout << e.what() << std::endl;
	}


	try
	{
		longEnoughArray.fill('\0');
		view1.copy(longEnoughArray.data(), 9999 );
		std::cout << "longEnoughArray after copy |" << std::string( longEnoughArray.data()) << "|\n" << std::endl;
	}
	catch (const std::exception& e)
	{
		std::cout << "copy to longEnoughArray" << std::endl;
		std::cout << e.what();
	}

	try
	{
		[[maybe_unused]]
		 std::string stuff(view1.substr(888, 25));
	}
	catch (const std::exception& e)
	{
		std::cout << "substr() from beyond end" << std::endl;
		std::cout << e.what() << std::endl;
	}

}

fun1()

Here I slam home the idea that we can do anything we want to the view without altering the source string. Take a look at line 20 where I pass str1 which is a std::string. The age old complaint of references rears its head here. Without looking at the function prototype you can’t tell if I str1 is passed as reference, by value, or a view. Remember what I told you about the elephant earlier. Academics can’t help themselves. Almost none of them have worked in the real world, that’s why those of us in the real world call them academics.

The compiler creates a view for us and that view exists only locally in fun1(). As you see in fun1(), you can assign new values to the view without changing the string. More importantly, this is true even if we had created the view and passed it by reference to a function. You can change the view, but not the source.

fun2()

Const pointers to const strings got a lot of developers in trouble when they wanted to get a substring or some other piece of a string (or any object for that matter.) You couldn’t always do it because the function parameters weren’t const pointers to const strings, just const.

The main purpose of fun2() is to show you that copy() isn’t any safer. In fact it is worse because you want to believe it is safer.

Accessing the .data() buffer for writing is always bad!

I don’t care how “one with the compiler & language spec” you think you are. None of this works if the source string is not ASCII. Read up on string literals and how many bytes each character takes for each. data() is always noting more than a block of bytes. What those bytes are, it hasn’t a clue. If the underlying data isn’t single byte ASCII encoding, you are screwed. That’s why so many C++ programmers use CopperSpice. Higher level string classes that handle multiple data types.

No Overwrite Protection

Please look at the try-catch blocks in the bottom of the listing. Now look at the output.

StringViewExample
Version 0.1
view1 passed to fun1() | I once saw a tiny tot have a tiny thought in a tiny room |
view1 after prefix removal |I once saw a tiny tot have a tiny thought in a tiny room |
view1 after suffix removal |I once saw a tiny tot have a tiny thought in a tiny room|
view1 after swap |Just a constant string|
view2 after swap |I once saw a tiny tot have a tiny thought in a tiny room|
view2 last character is: m
view2 first character is: I


view1 max size: 4611686018427387899
view2 max size: 4611686018427387899

str1 after fun1() call | I once saw a tiny tot have a tiny thought in a tiny room |
view1 passed to fun2() | I once saw a tiny tot have a tiny thought in a tiny room |
the whole thing size: 800 content | I once saw a tiny tot have a tiny thought in a tiny room
str 2 size: 63 content | I once saw a tiny tot have a tiny thought in a tiny room |
c_buffer | I once saw a tiny tot have a tiny thought in a tiny room |


***Substring stuff
I once saw a tiny tot
start_pos: 26 sub_len: 20 content |have a tiny thought |


****Now for the danger

tooShortArray | I once saw a tiny tot have a tiny thought in a tiny room |

too_short_c_buffer | I once saw a tiny tot have a tiny thought in a tiny room ny room |

longEnoughArray after copy | I once saw a tiny tot have a tiny thought in a tiny room |

substr() from beyond end
basic_string_view::substr: __pos (which is 888) > __size (which is 63)

You have over-read protection, but not overwrite.

For more C++ tutorials, take a look at the one on C++ Arrays and Pointers.

Roland Hughes started his IT career in the early 1980s. He quickly became a consultant and president of Logikal Solutions, a software consulting firm specializing in OpenVMS application and C++/Qt touchscreen/embedded Linux development. Early in his career he became involved in what is now called cross platform development. Given the dearth of useful books on the subject he ventured into the world of professional author in 1995 writing the first of the "Zinc It!" book series for John Gordon Burke Publisher, Inc.

A decade later he released a massive (nearly 800 pages) tome "The Minimum You Need to Know to Be an OpenVMS Application Developer" which tried to encapsulate the essential skills gained over what was nearly a 20 year career at that point. From there "The Minimum You Need to Know" book series was born.

Three years later he wrote his first novel "Infinite Exposure" which got much notice from people involved in the banking and financial security worlds. Some of the attacks predicted in that book have since come to pass. While it was not originally intended to be a trilogy, it became the first book of "The Earth That Was" trilogy:
Infinite Exposure
Lesedi - The Greatest Lie Ever Told
John Smith - Last Known Survivor of the Microsoft Wars

When he is not consulting Roland Hughes posts about technology and sometimes politics on his blog. He also has regularly scheduled Sunday posts appearing on the Interesting Authors blog.

One thought on “C++ std::string_view Example

Comments are closed.