Theres the string header, which tells you how long it is, and where the underlying data is. That's why both of the following are valid when you don't use a pointer receiver: In this particular case, using pointers will not make a difference. The string data contains no pointers so is not scanned. What is the difference with normal variables? Ive blogged before about running into Garbage Collector (GC) problems caused by large heaps. When Ive had issues with large heaps the major causes have been the following. What can we do about this? By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. It turns out that the Go memory manager knows what types each allocation is for, and will mark allocations that do not contain pointers so that the GC does not have to scan them. The string header is described by reflect.StringHeader, which looks like the following. Why the definition of bilinearity property is different in cryptography compared to mathematics? So, it turns out that pointers are the enemy, both when we have large amounts of memory allocated on-heap, and when we try to work around this by moving the data to our own off-heap allocations. Well, there are two parts to it. In extreme cases it can fail to keep up. Quite a few times. Well create 100,000,000 strings, copy the bytes from the strings into a single big byte slice, and store the offsets. Heres a tiny program to demonstrate. Announcing Design Accessibility Updates on SO, Difference between returning a pointer and a value in initialization methods. And then theres the underlying data, which is just a sequence of bytes. What is a string? Heres a small program to demonstrate the idea. We allocate a slice of a billion 8-byte ints, again this is approximately 8GB of memory. 468), Monitoring data quality with Bigeye(Ep. In this case, you may be losing quite a bit of potential performance to the GC. Not good. Well then show the GC time is still small, and demonstrate that we can retrieve the strings by showing the first 10. For mystrings Len and Cap will each be 100,000,000, and Data will point to a contiguous piece of memory large enough to contain 100,000,000 StringHeaders. Thats actually less than a nano-second per pointer to check each pointer. How can memory be uninteresting? But you need to be able to spot them to avoid them, and they arent always obvious. Find centralized, trusted content and collaborate around the technologies you use most. (How) Can I switch from field X to field Y after getting my PhD. Make a tiny island robust to ecologic collapse. Chi squared test with reasonable sample size results in R warning. How to use jq to return information to the shell, taking whitespace into account? Finally, pointers also let you represent "nothingness", which each pointer being able to be nil. It also prevents copying, which can be a performance improvement in very limited circumstances (do not pass pointers around all the time because it might be a performance improvement). Underlying mystrings is a reflect.SliceHeader, which looks similar to the reflect.StringHeader weve just seen. What can we do about this? In which European countries is illegal to publicly state an opinion that in the US would be protected by the first amendment? This is both a blessing and a curse as you must check if each pointer is nil before accessing it, though. This means that *MyError implements the Error interface and is thus assignable to the error type, and so it can be returned from any function that expects an error as a return value. Why does putting a pointer in an interface{} in Go cause reflect to lose the name of the type? So, use pointers when passing arguments or when declaring methods if you want to modify the object, or if copying the object would be too expensive. Here's one way to look at it: In Go, all variables are passed by value. Note this only works on unix-like operating systems, but there are similar things you can do on Windows. Why classical mechanics is not able to explain the net magnetization in ferromagnets? Mimimizing a monomial function subject to inequality constraints. https://blog.gopheracademy.com/advent-2017/unsafe-pointer-and-system-calls/, Personal Photo Management using Go and TensorFlow, If your string takes only a few fixed values then consider using integer constants instead, If you are storing dates and times as strings, then perhaps parse them and store the date or time as an integer, If you fundamentally need to keep hold of a lot of strings then read on. That means: Above, t is passed as value. Why does sdk expression need to be by the end of the bash_profile file? In fact every time Ive hit this problem Ive managed to be surprised, and in my shock Ive blogged about it again. This is only a small part of the story about pointer arguments. Suppose youve written an in-memory database, or youre building a data pipeline that needs a huge lookup table. Strings, slices and time.Time all contain pointers. If we do use off-heap allocations, then we need to avoid storing pointers to heap allocations unless these are also referenced by memory that is visible to the GC. Ive allocated 1 billion pointers. Example code (https://tour.golang.org/methods/19): In this case it is using *MyError and &MyError, but I try to remove the * and & and it works correctly. The Go Garbage Collector (GC) works exceptionally well when the amount of memory allocated is relatively small, but with larger heap sizes the GC can end up using considerable amounts of CPU. The huge array of string headers does contain pointers, so must be scanned on every GC cycle. If we alter this to use a normally allocated []*int as follows we get the expected result. Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide, I don't understand when to use pointers on go [duplicate], Pointers vs. values in parameters and return values, Difference between returning a pointer and a value in initialization methods [duplicate]. Theres a lot to say about different strategies to deal with each of these. In this post Ill just talk about one idea for dealing with strings. rev2022.8.2.42721. What we give up by doing this is the ability to free up memory for individual strings, and weve added some overhead copying the string bodies into our big byte slice. We either hide the memory from the GC, or make it uninteresting to the GC. More like San Francis-go (Ep. Take a look at https://blog.gopheracademy.com/advent-2017/unsafe-pointer-and-system-calls/). Here are some resources that you might find helpful dealing with these issues. If we can avoid any pointers within the types were allocating they wont cause GC overhead, so we wont need to use any off-heap tricks. Heres the equivalent of our first program where we allocate a []*int with a billion (1e9) entries. Returning MyError wouldn't work on its own because MyError is not a *MyError. On my 2015 MBP I get the following output. 469). Well, if all the string bytes were in a single piece of memory, we could track the strings by offsets to the start and end of each string in this memory. And we do that a few times to get a steady value. (Want to understand a := *(*[]*int)(unsafe.Pointer(&slice))? If type T2 is based on type T1, is there any sort of "inheritance" from T1 to T2? Why must fermenting meat be kept cold, but not vegetables? The StringHeaders that are contained in this slice, and the Data for each string, which are separate allocations, none of which can contain pointers. For simplicity, lets assume this is a single huge global var mystrings []string. Why should constructor of Go return address? Meaning of 'glass that's with canary lined'? When you pass a string variable to a function it is the string header that gets written to the stack, and if you keep a slice of strings, it is the string headers that appear in the slice. The strings themselves comprise two pieces. Which is a pretty good speed for looking at pointers. And this has bad consequences, which are tragically easy to demonstrate. If you were to remove * from func (e* MyError), you would be telling Go that Error() works on any instance of a MyError, which means that both *MyError and MyError would fulfill that contract. In practice, how explicitly can we describe a Galois representation? Why should I use a pointer ( performance)? And why should that be surprising? "When should I use pointers?" If the GC insists on periodically scanning all the memory weve allocated well lose huge amounts of the available processing power to the GC. In the below example were allocating exactly the same amount of memory as before, but now our allocation has no pointer types in it. Trending sort is based off of the default sorting method by highest score but it boosts votes that have happened recently, helping to surface more up-to-date answers. That seems like a fundamental problem. The GC is considerably more than 1000 times faster, for exactly the same amount of memory allocated. Getting paid by mistake after leaving a company? Site design / logo 2022 Stack Exchange Inc; user contributions licensed under CC BY-SA. If we can arrange for our in-memory tables to have no pointers, then were on to a winner. Is it really necessary considering the "wrong" position and normal behavior? Years of experience when hiring a car - would a motorbike license count? I am doing a tour of go language, and I have a question about pointers. Pointers are a way of passing a reference to a value, rather than a value itself, around, allowing you to modify the original value or "see" modifications to that value. is a very large question without a simple answer. Go does some helpful things when dealing with function receivers: It will let you call any method on a MyError or *MyError if the receiver is *MyError, but it will only let you call methods on a *MyError if the type is *MyError - That is, Go will not "create" a pointer for you out of thin air. We also call runtime.KeepAlive() to ensure the GC/compiler doesnt throw away our allocation in the meantime. When should I use pointers or not? What do we have here? Here we try to store the numbers 0, 1 & 2 in heap-allocated ints, and store pointers to them in our off-heap mmap-allocated slice. If our application needs a large in-memory lookup table, or if our application fundamentally is a large in-memory lookup table, then weve got a problem. In go, how do you create an interface when methods are called by *Type? Now, the memory here is invisible to the GC. That piece of memory contains pointers and hence will be scanned by the GC. What if the type of our allocated object doesnt contain pointers? We can try that. We then force a GC and time how long it takes. In large heaps, pointers are evil and must be avoided. We allocate a billion (1e9) 8 byte pointers, so approximately 8 GB of memory. In those scenarios, you may have Gigabytes of memory allocated. If we ask the OS for memory directly, the GC never finds out about it, and therefore does not scan it. If you store a lot of these in memory it may be necessary to take some steps. What is the rounding rule when the last digit is 5 in .NET? Lets find out! String headers contain pointers, so we want to avoid storing strings! Lets say were storing a hundred million strings. What rating point advantage does playing White equate to? How Can Cooked Meat Still Have Protein Value? Learn more about Collectives on Stack Overflow, San Francisco? That means when f is called, the compiler creates a copy of t and passed that to f. Any modifications f makes to that copy will not affect the t used to call f. Above, the compiler will create a pointer pointing to t, and pass a copy of that to f. If f makes changes to value, those changes will be made on the instance of t used to call f. In other words: This will print 0, because the modifications made by f is done on a copy of t. Above, it will print 1, because the call to f changes t. Same idea applies to methods and receivers: Above program will print 0, because the method modifies a copy of t. Above program will print 1, because the receiver of the method is declared with a pointer, and calling t.f() is equivalent to f(&t). The memory backing our ints is freed up and potentially re-used after each GC. Connect and share knowledge within a single location that is structured and easy to search. It falls back to sorting by highest score if no posts are trending. Why slice length greater than capacity gives runtime error? What does the Ariane 5 rocket use to turn? Its the string headers which are a problem from a GC point of view, not the string data itself. The GC takes over half a second. In your specific example, the reason why returning &MyError works is because your Error() function operates on a value of *MyError (a pointer to MyError), rather than on a value of MyError itself. This time, we use the mmap syscall to ask for the memory directly from the OS kernel. The other thing we can do is hide the allocations from the GC. The GCs job is to work out which pieces of memory are available to be freed, and it does this by scanning through memory looking for pointers to memory allocations. Well, the GC is looking for pointers. ELI5: Why is Russia's requirement for oil to be paid in Roubles abnormal? How much of a problem? We force a GC after allocating and storing a pointer to each int. Will the GC still scan it? This has the interesting consequence that pointers stored in this memory wont stop any normal allocations they point to from being collected by the GC. Announcing the Stacks Editor Beta release! How does JWST position itself to see and resolve an exact target? And heres our output. We essentially have two choices. Hopefully by reading this far you wont be surprised if it happens to your projects, or perhaps youll even anticipate the problem! The principle here is that if you never need to free a string, you can convert it to an index into a larger block of data and avoid having large numbers of pointers. Copyright Copyright 2019, GopherAcademy; all rights reserved. Ive built a slightly more sophisticated thing that follows this principle here if you are interested. To put it simply, if there are no pointers to an allocation then the allocation can be freed. By tracking offsets we no-longer have pointers in our large slice, and the GC is no longer troubled. So our data is not as we expected and were lucky not to crash. Doing this is a little more involved than our previous example! Why are they using pointers in this example? This works very well, but the more memory there is to scan the more time it takes.